Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 129, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38532339

RESUMO

BACKGROUND: The RNA-Recognition motif (RRM) is a protein domain that binds single-stranded RNA (ssRNA) and is present in as much as 2% of the human genome. Despite this important role in biology, RRM-ssRNA interactions are very challenging to study on the structural level because of the remarkable flexibility of ssRNA. In the absence of atomic-level experimental data, the only method able to predict the 3D structure of protein-ssRNA complexes with any degree of accuracy is ssRNA'TTRACT, an ssRNA fragment-based docking approach using ATTRACT. However, since ATTRACT parameters are not ssRNA-specific and were determined in 2010, there is substantial opportunity for enhancement. RESULTS: Here we present HIPPO, a composite RRM-ssRNA scoring potential derived analytically from contact frequencies in near-native versus non-native docking models. HIPPO consists of a consensus of four distinct potentials, each extracted from a distinct reference pool of protein-trinucleotide docking decoys. To score a docking pose with one potential, for each pair of RNA-protein coarse-grained bead types, each contact is awarded or penalised according to the relative frequencies of this contact distance range among the correct and incorrect poses of the reference pool. Validated on a fragment-based docking benchmark of 57 experimentally solved RRM-ssRNA complexes, HIPPO achieved a threefold or higher enrichment for half of the fragments, versus only a quarter with the ATTRACT scoring function. In particular, HIPPO drastically improved the chance of very high enrichment (12-fold or higher), a scenario where the incremental modelling of entire ssRNA chains from fragments becomes viable. However, for the latter result, more research is needed to make it directly practically applicable. Regardless, our approach already improves upon the state of the art in RRM-ssRNA modelling and is in principle extendable to other types of protein-nucleic acid interactions.


Assuntos
Proteínas , RNA , Humanos , Ligação Proteica , Proteínas/química , RNA/química , Simulação de Acoplamento Molecular , Conformação Proteica
2.
J Biomed Inform ; 135: 104212, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36182054

RESUMO

Machine learning is now an essential part of any biomedical study but its integration into real effective Learning Health Systems, including the whole process of Knowledge Discovery from Data (KDD), is not yet realised. We propose an original extension of the KDD process model that involves an inductive database. We designed for the first time a generic model of Inductive Clinical DataBase (ICDB) aimed at hosting both patient data and learned models. We report experiments conducted on patient data in the frame of a project dedicated to fight heart failure. The results show how the ICDB approach allows to identify biomarker combinations, specific and predictive of heart fibrosis phenotype, that put forward hypotheses relative to underlying mechanisms. Two main scenarios were considered, a local-to-global KDD scenario and a trans-cohort alignment scenario. This promising proof of concept enables us to draw the contours of a next-generation Knowledge Discovery Environment (KDE).


Assuntos
Mineração de Dados , Descoberta do Conhecimento , Bases de Dados Factuais
3.
Gastroenterology ; 158(1): 76-94.e2, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31593701

RESUMO

Since 2010, substantial progress has been made in artificial intelligence (AI) and its application to medicine. AI is explored in gastroenterology for endoscopic analysis of lesions, in detection of cancer, and to facilitate the analysis of inflammatory lesions or gastrointestinal bleeding during wireless capsule endoscopy. AI is also tested to assess liver fibrosis and to differentiate patients with pancreatic cancer from those with pancreatitis. AI might also be used to establish prognoses of patients or predict their response to treatments, based on multiple factors. We review the ways in which AI may help physicians make a diagnosis or establish a prognosis and discuss its limitations, knowing that further randomized controlled studies will be required before the approval of AI techniques by the health authorities.


Assuntos
Inteligência Artificial , Diagnóstico por Computador/métodos , Gastroenterologia/métodos , Gastroenteropatias/diagnóstico , Hepatopatias/diagnóstico , Tomada de Decisão Clínica/métodos , Sistemas de Apoio a Decisões Clínicas , Árvores de Decisões , Gastroenteropatias/mortalidade , Gastroenteropatias/terapia , Humanos , Hepatopatias/mortalidade , Hepatopatias/terapia , Prognóstico , Resultado do Tratamento
4.
BMC Med Inform Decis Mak ; 21(1): 171, 2021 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-34039343

RESUMO

BACKGROUND: Adverse drug reactions (ADRs) are statistically characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown in most cases. This is true even for hepatic or skin toxicities, which are classically monitored during drug design. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs, such as their properties, interactions, or involvements in pathways. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established. METHODS: We propose in this paper to mine knowledge graphs for identifying biomolecular features that may enable automatically reproducing expert classifications that distinguish drugs causative or not for a given type of ADR. In an Explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, (1) we mine a knowledge graph for features; (2) we train classifiers at distinguishing, on the basis of extracted features, drugs associated or not with two commonly monitored ADRs: drug-induced liver injuries (DILI) and severe cutaneous adverse reactions (SCAR); (3) we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and (4) we manually evaluate in a mini-study how they may be explanatory. RESULTS: Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR (Accuracy = 0.74 and 0.81, respectively). Experts fully agreed that 73% and 38% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for 90% and 77% of them. CONCLUSION: Knowledge graphs provide sufficiently diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Reconhecimento Automatizado de Padrão , Sistemas de Notificação de Reações Adversas a Medicamentos , Inteligência Artificial , Estudos de Viabilidade , Humanos , Farmacovigilância
5.
Nucleic Acids Res ; 42(Database issue): D389-95, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24271397

RESUMO

Comparing, classifying and modelling protein structural interactions can enrich our understanding of many biomolecular processes. This contribution describes Kbdock (http://kbdock.loria.fr/), a database system that combines the Pfam domain classification with coordinate data from the PDB to analyse and model 3D domain-domain interactions (DDIs). Kbdock can be queried using Pfam domain identifiers, protein sequences or 3D protein structures. For a given query domain or pair of domains, Kbdock retrieves and displays a non-redundant list of homologous DDIs or domain-peptide interactions in a common coordinate frame. Kbdock may also be used to search for and visualize interactions involving different, but structurally similar, Pfam families. Thus, structural DDI templates may be proposed even when there is little or no sequence similarity to the query domains.


Assuntos
Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Sítios de Ligação , Internet , Modelos Moleculares , Simulação de Acoplamento Molecular , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/classificação , Alinhamento de Sequência , Análise de Sequência de Proteína
6.
BMC Bioinformatics ; 14: 207, 2013 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-23802887

RESUMO

BACKGROUND: Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. RESULTS: In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. CONCLUSIONS: Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Bases de Dados de Produtos Farmacêuticos , Árvores de Decisões , Reprodutibilidade dos Testes
7.
Proteins ; 81(12): 2150-8, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24123156

RESUMO

Protein docking algorithms aim to calculate the three-dimensional (3D) structure of a protein complex starting from its unbound components. Although ab initio docking algorithms are improving, there is a growing need to use homology modeling techniques to exploit the rapidly increasing volumes of structural information that now exist. However, most current homology modeling approaches involve finding a pair of complete single-chain structures in a homologous protein complex to use as a 3D template, despite the fact that protein complexes are often formed from one or more domain-domain interactions (DDIs). To model 3D protein complexes by domain-domain homology, we have developed a case-based reasoning approach called KBDOCK which systematically identifies and reuses domain family binding sites from our database of nonredundant DDIs. When tested on 54 protein complexes from the Protein Docking Benchmark, our approach provides a near-perfect way to model single-domain protein complexes when full-homology templates are available, and it extends our ability to model more difficult cases when only partial or incomplete templates exist. These promising early results highlight the need for a new and diverse docking benchmark set, specifically designed to assess homology docking approaches.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Algoritmos , Sítios de Ligação , Bases de Dados de Proteínas , Modelos Moleculares , Simulação de Acoplamento Molecular , Linguagens de Programação , Conformação Proteica , Software
8.
Sci Rep ; 13(1): 3643, 2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-36871056

RESUMO

The search for an effective drug is still urgent for COVID-19 as no drug with proven clinical efficacy is available. Finding the new purpose of an approved or investigational drug, known as drug repurposing, has become increasingly popular in recent years. We propose here a new drug repurposing approach for COVID-19, based on knowledge graph (KG) embeddings. Our approach learns "ensemble embeddings" of entities and relations in a COVID-19 centric KG, in order to get a better latent representation of the graph elements. Ensemble KG-embeddings are subsequently used in a deep neural network trained for discovering potential drugs for COVID-19. Compared to related works, we retrieve more in-trial drugs among our top-ranked predictions, thus giving greater confidence in our prediction for out-of-trial drugs. For the first time to our knowledge, molecular docking is then used to evaluate the predictions obtained from drug repurposing using KG embedding. We show that Fosinopril is a potential ligand for the SARS-CoV-2 nsp13 target. We also provide explanations of our predictions thanks to rules extracted from the KG and instanciated by KG-derived explanatory paths. Molecular evaluation and explanatory paths bring reliability to our results and constitute new complementary and reusable methods for assessing KG-based drug repurposing.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Reposicionamento de Medicamentos , Simulação de Acoplamento Molecular , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Aprendizagem
9.
Open Res Eur ; 3: 97, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37645489

RESUMO

Background: Data management is fast becoming an essential part of scientific practice, driven by open science and FAIR (findable, accessible, interoperable, and reusable) data sharing requirements. Whilst data management plans (DMPs) are clear to data management experts and data stewards, understandings of their purpose and creation are often obscure to the producers of the data, which in academic environments are often PhD students. Methods: Within the RNAct EU Horizon 2020 ITN project, we engaged the 10 RNAct early-stage researchers (ESRs) in a training project aimed at formulating a DMP. To do so, we used the Data Stewardship Wizard (DSW) framework and modified the existing Life Sciences Knowledge Model into a simplified version aimed at training young scientists, with computational or experimental backgrounds, in core data management principles. We collected feedback from the ESRs during this exercise. Results: Here, we introduce our new life-sciences training DMP template for young scientists. We report and discuss our experiences as principal investigators (PIs) and ESRs during this project and address the typical difficulties that are encountered in developing and understanding a DMP. Conclusions: We found that the DS-wizard can also be an appropriate tool for DMP training, to get terminology and concepts across to researchers. A full training in addition requires an upstream step to present basic DMP concepts and a downstream step to publish a dataset in a (public) repository. Overall, the DS-Wizard tool was essential for our DMP training and we hope our efforts can be used in other projects.

10.
Bioinformatics ; 27(20): 2820-7, 2011 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21873637

RESUMO

MOTIVATION: In recent years, much structural information on protein domains and their pair-wise interactions has been made available in public databases. However, it is not yet clear how best to use this information to discover general rules or interaction patterns about structural protein-protein interactions. Improving our ability to detect and exploit structural interaction patterns will help to provide a better 3D picture of the known protein interactome, and will help to guide docking-based predictions of the 3D structures of unsolved protein complexes. RESULTS: This article presents KBDOCK, a 3D database approach for spatially clustering protein binding sites and for performing template-based (knowledge-based) protein docking. KBDOCK combines residue contact information from the 3DID database with the Pfam protein domain family classification together with coordinate data from the Protein Data Bank. This allows the 3D configurations of all known hetero domain-domain interactions to be superposed and clustered for each Pfam family. We find that most Pfam domain families have up to four hetero binding sites, and over 60% of all domain families have just one hetero binding site. The utility of this approach for template-based docking is demonstrated using 73 complexes from the Protein Docking Benchmark. Overall, up to 45 out of 73 complexes may be modelled by direct homology to existing domain interfaces, and key binding site information is found for 24 of the 28 remaining complexes. These results show that KBDOCK can often provide useful information for predicting the structures of unknown protein complexes. AVAILABILITY: http://kbdock.loria.fr/ CONTACT: Dave.Ritchie@inria.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Complexos Multiproteicos/química
11.
JACC Cardiovasc Imaging ; 15(2): 193-208, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34538625

RESUMO

OBJECTIVES: This study sought to identify homogenous echocardiographic phenotypes in community-based cohorts and assess their association with outcomes. BACKGROUND: Asymptomatic cardiac dysfunction leads to a high risk of long-term cardiovascular morbidity and mortality; however, better echocardiographic classification of asymptomatic individuals remains a challenge. METHODS: Echocardiographic phenotypes were identified using K-means clustering in the first generation of the STANISLAS (Yearly non-invasive follow-up of Health status of Lorraine insured inhabitants) cohort (N = 827; mean age: 60 ± 5 years; men: 48%), and their associations with vascular function and circulating biomarkers were also assessed. These phenotypes were externally validated in the Malmö Preventive Project cohort (N = 1,394; mean age: 67 ± 6 years; men: 70%), and their associations with the composite of cardiovascular mortality (CVM) or heart failure hospitalization (HFH) were assessed as well. RESULTS: Three echocardiographic phenotypes were identified as "mostly normal (MN)" (n = 334), "diastolic changes (D)" (n = 323), and "diastolic changes with structural remodeling (D/S)" (n = 170). The D and D/S phenotypes had similar ages, body mass indices, cardiovascular risk factors, vascular impairments, and diastolic function changes. The D phenotype consisted mainly of women and featured increased levels of inflammatory biomarkers, whereas the D/S phenotype, consisted predominantly of men, displayed the highest values of left ventricular mass, volume, and remodeling biomarkers. The phenotypes were predicted based on a simple algorithm including e', left ventricular mass and volume (e'VM algorithm). In the Malmö cohort, subgroups derived from e'VM algorithm were significantly associated with a higher risk of CVM and HFH (adjusted HR in the D phenotype = 1.87; 95% CI: 1.04 to 3.37; adjusted HR in the D/S phenotype = 3.02; 95% CI: 1.71 to 5.34). CONCLUSIONS: Among asymptomatic, middle-aged individuals, echocardiographic data-driven classification based on the simple e'VM algorithm identified profiles with different long-term HF risk. (4th Visit at 17 Years of Cohort STANISLAS-Stanislas Ancillary Study ESCIF [STANISLASV4]; NCT01391442).


Assuntos
Ecocardiografia , Insuficiência Cardíaca , Idoso , Feminino , Insuficiência Cardíaca/diagnóstico por imagem , Insuficiência Cardíaca/epidemiologia , Humanos , Incidência , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Fenótipo , Valor Preditivo dos Testes , Prognóstico , Volume Sistólico , Função Ventricular Esquerda
12.
Adv Exp Med Biol ; 696: 357-66, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21431576

RESUMO

One current challenge in biomedicine is to analyze large amounts of complex biological data for extracting domain knowledge. This work holds on the use of knowledge-based techniques such as knowledge discovery (KD) and knowledge representation (KR) in pharmacogenomics, where knowledge units represent genotype-phenotype relationships in the context of a given treatment. An objective is to design knowledge base (KB, here also mentioned as an ontology) and then to use it in the KD process itself. A method is proposed for dealing with two main tasks: (1) building a KB from heterogeneous data related to genotype, phenotype, and treatment, and (2) applying KD techniques on knowledge assertions for extracting genotype-phenotype relationships. An application was carried out on a clinical trial concerned with the variability of drug response to montelukast treatment. Genotype-genotype and genotype-phenotype associations were retrieved together with new associations, allowing the extension of the initial KB. This experiment shows the potential of KR and KD processes, especially for designing KB, checking KB consistency, and reasoning for problem solving.


Assuntos
Farmacogenética/estatística & dados numéricos , Acetatos/farmacologia , Antiasmáticos/farmacologia , Asma/tratamento farmacológico , Asma/genética , Biologia Computacional , Ciclopropanos , Interpretação Estatística de Dados , Mineração de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Estudos de Associação Genética/estatística & dados numéricos , Humanos , Bases de Conhecimento , Modelos Logísticos , Quinolinas/farmacologia , Sulfetos
13.
Cells ; 10(12)2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34943948

RESUMO

Glioblastoma (GBM) is the most common brain tumor in adults, which is very aggressive, with a very poor prognosis that affects men twice as much as women, suggesting that female hormones (estrogen) play a protective role. With an in silico approach, we highlighted that the expression of the membrane G-protein-coupled estrogen receptor (GPER) had an impact on GBM female patient survival. In this context, we explored for the first time the role of the GPER agonist G-1 on GBM cell proliferation. Our results suggested that G-1 exposure had a cytostatic effect, leading to reversible G2/M arrest, due to tubulin polymerization blockade during mitosis. However, the observed effect was independent of GPER. Interestingly, G-1 potentiated the efficacy of temozolomide, the current standard chemotherapy treatment, since the combination of both treatments led to prolonged mitotic arrest, even in a temozolomide less-sensitive cell line. In conclusion, our results suggested that G-1, in combination with standard chemotherapy, might be a promising way to limit the progression and aggressiveness of GBM.


Assuntos
Ciclopentanos/farmacologia , Glioblastoma/tratamento farmacológico , Quinolinas/farmacologia , Receptores de Estrogênio/genética , Receptores Acoplados a Proteínas G/genética , Temozolomida/farmacologia , Tubulina (Proteína)/genética , Animais , Apoptose/efeitos dos fármacos , Proliferação de Células/efeitos dos fármacos , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Glioblastoma/genética , Glioblastoma/patologia , Humanos , Camundongos , Mitose/efeitos dos fármacos , Receptores Acoplados a Proteínas G/agonistas , Ensaios Antitumorais Modelo de Xenoenxerto
14.
Sci Rep ; 11(1): 4202, 2021 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-33603019

RESUMO

The choice of the most appropriate unsupervised machine-learning method for "heterogeneous" or "mixed" data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of "ready-to-use" tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.

15.
BMC Bioinformatics ; 11: 588, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21122125

RESUMO

BACKGROUND: The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes). RESULTS: We present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to previously published measures. CONCLUSIONS: The IntelliGO similarity measure provides a customizable and comprehensive method for quantifying gene similarity based on GO annotations. It also displays a robust set-discriminating power which suggests it will be useful for functional clustering. AVAILABILITY: An on-line version of the IntelliGO similarity measure is available at: http://bioinfo.loria.fr/Members/benabdsi/intelligo_project/


Assuntos
Algoritmos , Biologia Computacional/métodos , Anotação de Sequência Molecular , Análise por Conglomerados , Humanos , Proteínas/genética , Saccharomyces cerevisiae/genética , Semântica , Software , Terminologia como Assunto , Vocabulário Controlado
16.
J Chem Inf Model ; 50(5): 701-15, 2010 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-20420434

RESUMO

In silico screening methodologies are widely recognized as efficient approaches in early steps of drug discovery. However, in the virtual high-throughput screening (VHTS) context, where hit compounds are searched among millions of candidates, three-dimensional comparison techniques and knowledge discovery from databases should offer a better efficiency to finding novel drug leads than those of computationally expensive molecular dockings. Therefore, the present study aims at developing a filtering methodology to efficiently eliminate unsuitable compounds in VHTS process. Several filters are evaluated in this paper. The first two are structure-based and rely on either geometrical docking or pharmacophore depiction. The third filter is ligand-based and uses knowledge-based and fingerprint similarity techniques. These filtering methods were tested with the Liver X Receptor (LXR) as a target of therapeutic interest, as LXR is a key regulator in maintaining cholesterol homeostasis. The results show that the three considered filters are complementary so that their combination should generate consistent compound lists of potential hits.


Assuntos
Desenho de Fármacos , Receptores Nucleares Órfãos/metabolismo , Humanos , Ligantes , Receptores X do Fígado , Modelos Moleculares , Receptores Nucleares Órfãos/química , Ligação Proteica
17.
Yearb Med Inform ; 29(1): 188-192, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32823315

RESUMO

OBJECTIVES: Summarize recent research and select the best papers published in 2019 in the field of Bioinformatics and Translational Informatics (BTI) for the corresponding section of the International Medical Informatics Association Yearbook. METHODS: A literature review was performed for retrieving from PubMed papers indexed with keywords and free terms related to BTI. Independent review allowed the section editors to select a list of 15 candidate best papers which were subsequently peer-reviewed. A final consensus meeting gathering the whole Yearbook editorial committee was organized to finally decide on the selection of the best papers. RESULTS: Among the 931 retrieved papers covering the various subareas of BTI, the review process selected four best papers. The first paper presents a logical modeling of cancer pathways. Using their tools, the authors are able to identify two known behaviours of tumors. The second paper describes a deep-learning approach to predicting resistance to antibiotics in Mycobacterium tuberculosis. The authors of the third paper introduce a Genomic Global Positioning System (GPS) enabling comparison of genomic data with other individuals or genomics databases while preserving privacy. The fourth paper presents a multi-omics and temporal sequence-based approach to provide a better understanding of the sequence of events leading to Alzheimer's Disease. CONCLUSIONS: Thanks to the normalization of open data and open science practices, research in BTI continues to develop and mature. Noteworthy achievements are sophisticated applications of leading edge machine-learning methods dedicated to personalized medicine.


Assuntos
Biologia Computacional , Genômica , Biologia Computacional/ética , Humanos , Aprendizado de Máquina , Informática Médica , Pesquisa Translacional Biomédica
18.
Sci Data ; 7(1): 3, 2020 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-31896797

RESUMO

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.


Assuntos
Curadoria de Dados , Farmacogenética , Aprendizado de Máquina Supervisionado , Humanos , PubMed
19.
Yearb Med Inform ; 28(1): 190-193, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31419831

RESUMO

OBJECTIVES: To summarize recent research and select the best papers published in 2018 in the field of Bioinformatics and Translational Informatics (BTI) for the corresponding section of the International Medical Informatics Association (IMIA) Yearbook. METHODS: A literature review was performed for retrieving from PubMed papers indexed with keywords and free terms related to BTI. Independent review allowed the two section editors to select a list of 14 candidate best papers which were subsequently peer-reviewed. A final consensus meeting gathering the whole IMIA Yearbook editorial committee was organized to finally decide on the selection of the best papers. RESULTS: Among the 636 retrieved papers published in 2018 in the various subareas of BTI, the review process selected four best papers. The first paper presents a computational method to identify molecular markers for targeted treatment of acute myeloid leukemia using multi-omics data (genome-wide gene expression profiles) and in vitro sensitivity to 160 chemotherapy drugs. The second paper describes a deep neural network approach to predict the survival of patients suffering from glioma on the basis of digitalised pathology images and genomics biomarkers. The authors of the third paper adopt a pan-cancer approach to take benefit of multi-omics data for drug repurposing. The fourth paper presents a graph-based semi-supervised method to accurate phenotype classification applied to ovarian cancer. CONCLUSIONS: Thanks to the normalization of open data and open science practices, research in BTI continues to develop and mature. Noteworthy achievements are sophisticated applications of leading edge machine-learning methods dedicated to personalized medicine.


Assuntos
Inteligência Artificial , Biologia Computacional , Pesquisa Translacional Biomédica , Biologia Computacional/ética , Humanos , Aprendizado de Máquina , Informática Médica , Neoplasias/genética , Neoplasias/patologia , Prognóstico
20.
Int J Lab Hematol ; 41(6): 726-730, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31523903

RESUMO

INTRODUCTION: The confirmation time interval for the presence of antiphospholipid antibodies (aPL) has been extended to 12 weeks as epiphenomenal antibodies may disappear after 6 weeks. Our aim was to analyse extended persistence of aPL positivity beyond the 12-week interval. METHODS: We retrospectively analysed our database of 23 856 aPL test samples collected between 2005 and 2017 from 17 367 consecutive patients. Two groups of patients were identified among aPL-positive patients, confirmed at 12 weeks: with or without extended persistence beyond confirmatory testing. Percentages of extended persistence are given according to the initial aPL positivity profiles, and baseline laboratory variables are compared between the two groups. RESULTS: Three hundred and twenty-seven patients confirmed aPL-positive had subsequent testing. The vast majority of them displayed extended persistence in the long term: 89.6% and up to 97.9% for patients with initial triple positivity. In extended persistent positive patients, there were more LA-positive initial samples, and baseline LA test values and IgG aCL titres were higher than in nonpersistent positive patients. CONCLUSION: Data from a large database of an aPL referral laboratory showed that the time interval of 12 weeks defining persistence of aPL positivity was appropriate for the majority of patients. Furthermore, we found baseline features associated with extended persistence.


Assuntos
Anticorpos Antifosfolipídeos/sangue , Adulto , Síndrome Antifosfolipídica/sangue , Síndrome Antifosfolipídica/imunologia , Feminino , Humanos , Inibidor de Coagulação do Lúpus/sangue , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA