Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 23(Suppl 2): 433, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36510133

RESUMO

BACKGROUND: Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION: Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.


Assuntos
Biologia Computacional , Semântica , Humanos , Ontologia Genética , Anotação de Sequência Molecular , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química
2.
PLoS Comput Biol ; 17(8): e1008844, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34370723

RESUMO

Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called "PPIDM" (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described "CODAC" (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as "Gold-Standard" a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Mapas de Interação de Proteínas , Algoritmos , Biologia Computacional , Mineração de Dados/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Software
3.
J Chem Inf Model ; 62(12): 3107-3122, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35754360

RESUMO

Emerging SARS-CoV-2 variants raise concerns about our ability to withstand the Covid-19 pandemic, and therefore, understanding mechanistic differences of those variants is crucial. In this study, we investigate disparities between the SARS-CoV-2 wild type and five variants that emerged in late 2020, focusing on the structure and dynamics of the spike protein interface with the human angiotensin-converting enzyme 2 (ACE2) receptor, by using crystallographic structures and extended analysis of microsecond molecular dynamics simulations. Dihedral angle principal component analysis (PCA) showed the strong similarities in the spike receptor binding domain (RBD) dynamics of the Alpha, Beta, Gamma, and Delta variants, in contrast with those of WT and Epsilon. Dynamical perturbation networks and contact PCA identified the peculiar interface dynamics of the Delta variant, which cannot be directly imputable to its specific L452R and T478K mutations since those residues are not in direct contact with the human ACE2 receptor. Our outcome shows that in the Delta variant the L452R and T478K mutations act synergistically on neighboring residues to provoke drastic changes in the spike/ACE2 interface; thus a singular mechanism of action eventually explains why it dominated over preceding variants.


Assuntos
COVID-19 , SARS-CoV-2 , Enzima de Conversão de Angiotensina 2/genética , Humanos , Simulação de Dinâmica Molecular , Mutação , Pandemias , Ligação Proteica , SARS-CoV-2/genética
4.
J Biomed Inform ; 135: 104212, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36182054

RESUMO

Machine learning is now an essential part of any biomedical study but its integration into real effective Learning Health Systems, including the whole process of Knowledge Discovery from Data (KDD), is not yet realised. We propose an original extension of the KDD process model that involves an inductive database. We designed for the first time a generic model of Inductive Clinical DataBase (ICDB) aimed at hosting both patient data and learned models. We report experiments conducted on patient data in the frame of a project dedicated to fight heart failure. The results show how the ICDB approach allows to identify biomarker combinations, specific and predictive of heart fibrosis phenotype, that put forward hypotheses relative to underlying mechanisms. Two main scenarios were considered, a local-to-global KDD scenario and a trans-cohort alignment scenario. This promising proof of concept enables us to draw the contours of a next-generation Knowledge Discovery Environment (KDE).


Assuntos
Mineração de Dados , Descoberta do Conhecimento , Bases de Dados Factuais
5.
Gastroenterology ; 158(1): 76-94.e2, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31593701

RESUMO

Since 2010, substantial progress has been made in artificial intelligence (AI) and its application to medicine. AI is explored in gastroenterology for endoscopic analysis of lesions, in detection of cancer, and to facilitate the analysis of inflammatory lesions or gastrointestinal bleeding during wireless capsule endoscopy. AI is also tested to assess liver fibrosis and to differentiate patients with pancreatic cancer from those with pancreatitis. AI might also be used to establish prognoses of patients or predict their response to treatments, based on multiple factors. We review the ways in which AI may help physicians make a diagnosis or establish a prognosis and discuss its limitations, knowing that further randomized controlled studies will be required before the approval of AI techniques by the health authorities.


Assuntos
Inteligência Artificial , Diagnóstico por Computador/métodos , Gastroenterologia/métodos , Gastroenteropatias/diagnóstico , Hepatopatias/diagnóstico , Tomada de Decisão Clínica/métodos , Sistemas de Apoio a Decisões Clínicas , Árvores de Decisões , Gastroenteropatias/mortalidade , Gastroenteropatias/terapia , Humanos , Hepatopatias/mortalidade , Hepatopatias/terapia , Prognóstico , Resultado do Tratamento
6.
Biomarkers ; 25(2): 201-211, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32063068

RESUMO

Background: Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous syndrome for which clear evidence of effective therapies is lacking. Understanding which factors determine this heterogeneity may be helped by better phenotyping. An unsupervised statistical approach applied to a large set of biomarkers may identify distinct HFpEF phenotypes.Methods: Relevant proteomic biomarkers were analyzed in 392 HFpEF patients included in Metabolic Road to Diastolic HF (MEDIA-DHF). We performed an unsupervised cluster analysis to define distinct phenotypes. Cluster characteristics were explored with logistic regression. The association between clusters and 1-year cardiovascular (CV) death and/or CV hospitalization was studied using Cox regression.Results: Based on 415 biomarkers, we identified 2 distinct clusters. Clinical variables associated with cluster 2 were diabetes, impaired renal function, loop diuretics and/or betablockers. In addition, 17 biomarkers were higher expressed in cluster 2 vs. 1. Patients in cluster 2 vs. those in 1 experienced higher rates of CV death/CV hospitalization (adj. HR 1.93, 95% CI 1.12-3.32, p = 0.017). Complex-network analyses linked these biomarkers to immune system activation, signal transduction cascades, cell interactions and metabolism.Conclusion: Unsupervised machine-learning algorithms applied to a wide range of biomarkers identified 2 HFpEF clusters with different CV phenotypes and outcomes. The identified pathways may provide a basis for future research.Clinical significanceMore insight is obtained in the mechanisms related to poor outcome in HFpEF patients since it was demonstrated that biomarkers associated with the high-risk cluster were related to the immune system, signal transduction cascades, cell interactions and metabolismBiomarkers (and pathways) identified in this study may help select high-risk HFpEF patients which could be helpful for the inclusion/exclusion of patients in future trials.Our findings may be the basis of investigating therapies specifically targeting these pathways and the potential use of corresponding markers potentially identifying patients with distinct mechanistic bioprofiles most likely to respond to the selected mechanistically targeted therapies.


Assuntos
Insuficiência Cardíaca/fisiopatologia , Fenótipo , Idoso , Biomarcadores/análise , Análise por Conglomerados , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Proteômica , Volume Sistólico
7.
BMC Bioinformatics ; 19(Suppl 14): 413, 2018 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-30453875

RESUMO

BACKGROUND: Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS: We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach "CODAC" (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe "GODomainMiner" for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS: These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Área Sob a Curva , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Domínios Proteicos
8.
BMC Bioinformatics ; 18(1): 107, 2017 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-28193156

RESUMO

BACKGROUND: Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level. RESULTS: This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations. CONCLUSION: These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados de Proteínas , Enzimas , Proteínas , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
9.
Proteins ; 85(3): 463-469, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27701764

RESUMO

Many of the modeling targets in the blind CASP-11/CAPRI-30 experiment were protein homo-dimers and homo-tetramers. Here, we perform a retrospective docking-based analysis of the perfectly symmetrical CAPRI Round 30 targets whose crystal structures have been published. Starting from the CASP "stage-2" fold prediction models, we show that using our recently developed "SAM" polar Fourier symmetry docking algorithm combined with NAMD energy minimization often gives acceptable or better 3D models of the target complexes. We also use SAM to analyze the overall quality of all CASP structural models for the selected targets from a docking-based perspective. We demonstrate that docking only CASP "center" structures for the selected targets provides a fruitful and economical docking strategy. Furthermore, our results show that many of the CASP models are dockable in the sense that they can lead to acceptable or better models of symmetrical complexes. Even though SAM is very fast, using docking and NAMD energy minimization to pull out acceptable docking models from a large ensemble of docked CASP models is computationally expensive. Nonetheless, thanks to our SAM docking algorithm, we expect that applying our docking protocol on a modern computer cluster will give us the ability to routinely model 3D structures of symmetrical protein complexes from CASP-quality models. Proteins 2017; 85:463-469. © 2016 Wiley Periodicals, Inc.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação de Acoplamento Molecular/métodos , Proteínas/química , Software , Motivos de Aminoácidos , Benchmarking , Sítios de Ligação , Cristalografia por Raios X , Ligação Proteica , Conformação Proteica , Mapeamento de Interação de Proteínas , Multimerização Proteica , Projetos de Pesquisa , Homologia Estrutural de Proteína , Termodinâmica
10.
Proteins ; 84 Suppl 1: 323-48, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27122118

RESUMO

We present the results for CAPRI Round 30, the first joint CASP-CAPRI experiment, which brought together experts from the protein structure prediction and protein-protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact-sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology-built subunit models and the smaller pair-wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323-348. © 2016 Wiley Periodicals, Inc.


Assuntos
Biologia Computacional/estatística & dados numéricos , Modelos Estatísticos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Proteínas/química , Software , Algoritmos , Motivos de Aminoácidos , Bactérias/química , Sítios de Ligação , Biologia Computacional/métodos , Humanos , Cooperação Internacional , Internet , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Dobramento de Proteína , Domínios e Motivos de Interação entre Proteínas , Multimerização Proteica , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos , Termodinâmica
11.
Nucleic Acids Res ; 42(Database issue): D389-95, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24271397

RESUMO

Comparing, classifying and modelling protein structural interactions can enrich our understanding of many biomolecular processes. This contribution describes Kbdock (http://kbdock.loria.fr/), a database system that combines the Pfam domain classification with coordinate data from the PDB to analyse and model 3D domain-domain interactions (DDIs). Kbdock can be queried using Pfam domain identifiers, protein sequences or 3D protein structures. For a given query domain or pair of domains, Kbdock retrieves and displays a non-redundant list of homologous DDIs or domain-peptide interactions in a common coordinate frame. Kbdock may also be used to search for and visualize interactions involving different, but structurally similar, Pfam families. Thus, structural DDI templates may be proposed even when there is little or no sequence similarity to the query domains.


Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Sítios de Ligação , Internet , Modelos Moleculares , Simulação de Acoplamento Molecular , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/classificação , Alinhamento de Sequência , Análise de Sequência de Proteína
12.
BMC Bioinformatics ; 14: 207, 2013 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-23802887

RESUMO

BACKGROUND: Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. RESULTS: In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. CONCLUSIONS: Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Bases de Dados de Produtos Farmacêuticos , Árvores de Decisões , Reprodutibilidade dos Testes
13.
Proteins ; 81(12): 2150-8, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24123156

RESUMO

Protein docking algorithms aim to calculate the three-dimensional (3D) structure of a protein complex starting from its unbound components. Although ab initio docking algorithms are improving, there is a growing need to use homology modeling techniques to exploit the rapidly increasing volumes of structural information that now exist. However, most current homology modeling approaches involve finding a pair of complete single-chain structures in a homologous protein complex to use as a 3D template, despite the fact that protein complexes are often formed from one or more domain-domain interactions (DDIs). To model 3D protein complexes by domain-domain homology, we have developed a case-based reasoning approach called KBDOCK which systematically identifies and reuses domain family binding sites from our database of nonredundant DDIs. When tested on 54 protein complexes from the Protein Docking Benchmark, our approach provides a near-perfect way to model single-domain protein complexes when full-homology templates are available, and it extends our ability to model more difficult cases when only partial or incomplete templates exist. These promising early results highlight the need for a new and diverse docking benchmark set, specifically designed to assess homology docking approaches.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Algoritmos , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Simulação de Acoplamento Molecular , Linguagens de Programação , Conformação Proteica , Software
14.
Bioinform Adv ; 3(1): vbad081, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37431435

RESUMO

Motivation: Protein domains can be viewed as building blocks, essential for understanding structure-function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances. Results: We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories ('Core', 'True', 'Domain-like' and 'Failed'). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 'True' and 541 'Domain-like' structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering. Availability and implementation: The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

15.
J Biomed Semantics ; 14(1): 7, 2023 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-37393296

RESUMO

The current rise of Open Science and Reproducibility in the Life Sciences requires the creation of rich, machine-actionable metadata in order to better share and reuse biological digital resources such as datasets, bioinformatics tools, training materials, etc. For this purpose, FAIR principles have been defined for both data and metadata and adopted by large communities, leading to the definition of specific metrics. However, automatic FAIRness assessment is still difficult because computational evaluations frequently require technical expertise and can be time-consuming. As a first step to address these issues, we propose FAIR-Checker, a web-based tool to assess the FAIRness of metadata presented by digital resources. FAIR-Checker offers two main facets: a "Check" module providing a thorough metadata evaluation and recommendations, and an "Inspect" module which assists users in improving metadata quality and therefore the FAIRness of their resource. FAIR-Checker leverages Semantic Web standards and technologies such as SPARQL queries and SHACL constraints to automatically assess FAIR metrics. Users are notified of missing, necessary, or recommended metadata for various resource categories. We evaluate FAIR-Checker in the context of improving the FAIRification of individual resources, through better metadata, as well as analyzing the FAIRness of more than 25 thousand bioinformatics software descriptions.


Assuntos
Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Web Semântica , Biologia Computacional
16.
Sci Rep ; 13(1): 3643, 2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-36871056

RESUMO

The search for an effective drug is still urgent for COVID-19 as no drug with proven clinical efficacy is available. Finding the new purpose of an approved or investigational drug, known as drug repurposing, has become increasingly popular in recent years. We propose here a new drug repurposing approach for COVID-19, based on knowledge graph (KG) embeddings. Our approach learns "ensemble embeddings" of entities and relations in a COVID-19 centric KG, in order to get a better latent representation of the graph elements. Ensemble KG-embeddings are subsequently used in a deep neural network trained for discovering potential drugs for COVID-19. Compared to related works, we retrieve more in-trial drugs among our top-ranked predictions, thus giving greater confidence in our prediction for out-of-trial drugs. For the first time to our knowledge, molecular docking is then used to evaluate the predictions obtained from drug repurposing using KG embedding. We show that Fosinopril is a potential ligand for the SARS-CoV-2 nsp13 target. We also provide explanations of our predictions thanks to rules extracted from the KG and instanciated by KG-derived explanatory paths. Molecular evaluation and explanatory paths bring reliability to our results and constitute new complementary and reusable methods for assessing KG-based drug repurposing.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Reposicionamento de Medicamentos , Simulação de Acoplamento Molecular , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Aprendizagem
17.
Open Res Eur ; 3: 97, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37645489

RESUMO

Background: Data management is fast becoming an essential part of scientific practice, driven by open science and FAIR (findable, accessible, interoperable, and reusable) data sharing requirements. Whilst data management plans (DMPs) are clear to data management experts and data stewards, understandings of their purpose and creation are often obscure to the producers of the data, which in academic environments are often PhD students. Methods: Within the RNAct EU Horizon 2020 ITN project, we engaged the 10 RNAct early-stage researchers (ESRs) in a training project aimed at formulating a DMP. To do so, we used the Data Stewardship Wizard (DSW) framework and modified the existing Life Sciences Knowledge Model into a simplified version aimed at training young scientists, with computational or experimental backgrounds, in core data management principles. We collected feedback from the ESRs during this exercise. Results: Here, we introduce our new life-sciences training DMP template for young scientists. We report and discuss our experiences as principal investigators (PIs) and ESRs during this project and address the typical difficulties that are encountered in developing and understanding a DMP. Conclusions: We found that the DS-wizard can also be an appropriate tool for DMP training, to get terminology and concepts across to researchers. A full training in addition requires an upstream step to present basic DMP concepts and a downstream step to publish a dataset in a (public) repository. Overall, the DS-Wizard tool was essential for our DMP training and we hope our efforts can be used in other projects.

18.
Bioinformatics ; 27(20): 2820-7, 2011 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21873637

RESUMO

MOTIVATION: In recent years, much structural information on protein domains and their pair-wise interactions has been made available in public databases. However, it is not yet clear how best to use this information to discover general rules or interaction patterns about structural protein-protein interactions. Improving our ability to detect and exploit structural interaction patterns will help to provide a better 3D picture of the known protein interactome, and will help to guide docking-based predictions of the 3D structures of unsolved protein complexes. RESULTS: This article presents KBDOCK, a 3D database approach for spatially clustering protein binding sites and for performing template-based (knowledge-based) protein docking. KBDOCK combines residue contact information from the 3DID database with the Pfam protein domain family classification together with coordinate data from the Protein Data Bank. This allows the 3D configurations of all known hetero domain-domain interactions to be superposed and clustered for each Pfam family. We find that most Pfam domain families have up to four hetero binding sites, and over 60% of all domain families have just one hetero binding site. The utility of this approach for template-based docking is demonstrated using 73 complexes from the Protein Docking Benchmark. Overall, up to 45 out of 73 complexes may be modelled by direct homology to existing domain interfaces, and key binding site information is found for 24 of the 28 remaining complexes. These results show that KBDOCK can often provide useful information for predicting the structures of unknown protein complexes. AVAILABILITY: http://kbdock.loria.fr/ CONTACT: Dave.Ritchie@inria.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Complexos Multiproteicos/química
19.
Nucleic Acids Res ; 38(Web Server issue): W445-9, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20444869

RESUMO

HexServer (http://hexserver.loria.fr/) is the first Fourier transform (FFT)-based protein docking server to be powered by graphics processors. Using two graphics processors simultaneously, a typical 6D docking run takes approximately 15 s, which is up to two orders of magnitude faster than conventional FFT-based docking approaches using comparable resolution and scoring functions. The server requires two protein structures in PDB format to be uploaded, and it produces a ranked list of up to 1000 docking predictions. Knowledge of one or both protein binding sites may be used to focus and shorten the calculation when such information is available. The first 20 predictions may be accessed individually, and a single file of all predicted orientations may be downloaded as a compressed multi-model PDB file. The server is publicly available and does not require any registration or identification by the user.


Assuntos
Complexos Multiproteicos/química , Software , Algoritmos , Sítios de Ligação , Gráficos por Computador , Internet , Conformação Proteica , Interface Usuário-Computador
20.
JACC Cardiovasc Imaging ; 15(2): 193-208, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34538625

RESUMO

OBJECTIVES: This study sought to identify homogenous echocardiographic phenotypes in community-based cohorts and assess their association with outcomes. BACKGROUND: Asymptomatic cardiac dysfunction leads to a high risk of long-term cardiovascular morbidity and mortality; however, better echocardiographic classification of asymptomatic individuals remains a challenge. METHODS: Echocardiographic phenotypes were identified using K-means clustering in the first generation of the STANISLAS (Yearly non-invasive follow-up of Health status of Lorraine insured inhabitants) cohort (N = 827; mean age: 60 ± 5 years; men: 48%), and their associations with vascular function and circulating biomarkers were also assessed. These phenotypes were externally validated in the Malmö Preventive Project cohort (N = 1,394; mean age: 67 ± 6 years; men: 70%), and their associations with the composite of cardiovascular mortality (CVM) or heart failure hospitalization (HFH) were assessed as well. RESULTS: Three echocardiographic phenotypes were identified as "mostly normal (MN)" (n = 334), "diastolic changes (D)" (n = 323), and "diastolic changes with structural remodeling (D/S)" (n = 170). The D and D/S phenotypes had similar ages, body mass indices, cardiovascular risk factors, vascular impairments, and diastolic function changes. The D phenotype consisted mainly of women and featured increased levels of inflammatory biomarkers, whereas the D/S phenotype, consisted predominantly of men, displayed the highest values of left ventricular mass, volume, and remodeling biomarkers. The phenotypes were predicted based on a simple algorithm including e', left ventricular mass and volume (e'VM algorithm). In the Malmö cohort, subgroups derived from e'VM algorithm were significantly associated with a higher risk of CVM and HFH (adjusted HR in the D phenotype = 1.87; 95% CI: 1.04 to 3.37; adjusted HR in the D/S phenotype = 3.02; 95% CI: 1.71 to 5.34). CONCLUSIONS: Among asymptomatic, middle-aged individuals, echocardiographic data-driven classification based on the simple e'VM algorithm identified profiles with different long-term HF risk. (4th Visit at 17 Years of Cohort STANISLAS-Stanislas Ancillary Study ESCIF [STANISLASV4]; NCT01391442).


Assuntos
Ecocardiografia , Insuficiência Cardíaca , Idoso , Feminino , Insuficiência Cardíaca/diagnóstico por imagem , Insuficiência Cardíaca/epidemiologia , Humanos , Incidência , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Fenótipo , Valor Preditivo dos Testes , Prognóstico , Volume Sistólico , Função Ventricular Esquerda
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA