Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
JMIR Med Inform ; 12: e57164, 2024 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-38904984

RESUMO

BACKGROUND: Vaccines serve as a crucial public health tool, although vaccine hesitancy continues to pose a significant threat to full vaccine uptake and, consequently, community health. Understanding and tracking vaccine hesitancy is essential for effective public health interventions; however, traditional survey methods present various limitations. OBJECTIVE: This study aimed to create a real-time, natural language processing (NLP)-based tool to assess vaccine sentiment and hesitancy across 3 prominent social media platforms. METHODS: We mined and curated discussions in English from Twitter (subsequently rebranded as X), Reddit, and YouTube social media platforms posted between January 1, 2011, and October 31, 2021, concerning human papillomavirus; measles, mumps, and rubella; and unspecified vaccines. We tested multiple NLP algorithms to classify vaccine sentiment into positive, neutral, or negative and to classify vaccine hesitancy using the World Health Organization's (WHO) 3Cs (confidence, complacency, and convenience) hesitancy model, conceptualizing an online dashboard to illustrate and contextualize trends. RESULTS: We compiled over 86 million discussions. Our top-performing NLP models displayed accuracies ranging from 0.51 to 0.78 for sentiment classification and from 0.69 to 0.91 for hesitancy classification. Explorative analysis on our platform highlighted variations in online activity about vaccine sentiment and hesitancy, suggesting unique patterns for different vaccines. CONCLUSIONS: Our innovative system performs real-time analysis of sentiment and hesitancy on 3 vaccine topics across major social networks, providing crucial trend insights to assist campaigns aimed at enhancing vaccine uptake and public health.

2.
medRxiv ; 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38798420

RESUMO

Background: Initial insights into oncology clinical trial outcomes are often gleaned manually from conference abstracts. We aimed to develop an automated system to extract safety and efficacy information from study abstracts with high precision and fine granularity, transforming them into computable data for timely clinical decision-making. Methods: We collected clinical trial abstracts from key conferences and PubMed (2012-2023). The SEETrials system was developed with four modules: preprocessing, prompt modeling, knowledge ingestion and postprocessing. We evaluated the system's performance qualitatively and quantitatively and assessed its generalizability across different cancer types- multiple myeloma (MM), breast, lung, lymphoma, and leukemia. Furthermore, the efficacy and safety of innovative therapies, including CAR-T, bispecific antibodies, and antibody-drug conjugates (ADC), in MM were analyzed across a large scale of clinical trial studies. Results: SEETrials achieved high precision (0.958), recall (sensitivity) (0.944), and F1 score (0.951) across 70 data elements present in the MM trial studies Generalizability tests on four additional cancers yielded precision, recall, and F1 scores within the 0.966-0.986 range. Variation in the distribution of safety and efficacy-related entities was observed across diverse therapies, with certain adverse events more common in specific treatments. Comparative performance analysis using overall response rate (ORR) and complete response (CR) highlighted differences among therapies: CAR-T (ORR: 88%, 95% CI: 84-92%; CR: 95%, 95% CI: 53-66%), bispecific antibodies (ORR: 64%, 95% CI: 55-73%; CR: 27%, 95% CI: 16-37%), and ADC (ORR: 51%, 95% CI: 37-65%; CR: 26%, 95% CI: 1-51%). Notable study heterogeneity was identified (>75% I 2 heterogeneity index scores) across several outcome entities analyzed within therapy subgroups. Conclusion: SEETrials demonstrated highly accurate data extraction and versatility across different therapeutics and various cancer domains. Its automated processing of large datasets facilitates nuanced data comparisons, promoting the swift and effective dissemination of clinical insights.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38520725

RESUMO

OBJECTIVES: The rapid expansion of biomedical literature necessitates automated techniques to discern relationships between biomedical concepts from extensive free text. Such techniques facilitate the development of detailed knowledge bases and highlight research deficiencies. The LitCoin Natural Language Processing (NLP) challenge, organized by the National Center for Advancing Translational Science, aims to evaluate such potential and provides a manually annotated corpus for methodology development and benchmarking. MATERIALS AND METHODS: For the named entity recognition (NER) task, we utilized ensemble learning to merge predictions from three domain-specific models, namely BioBERT, PubMedBERT, and BioM-ELECTRA, devised a rule-driven detection method for cell line and taxonomy names and annotated 70 more abstracts as additional corpus. We further finetuned the T0pp model, with 11 billion parameters, to boost the performance on relation extraction and leveraged entites' location information (eg, title, background) to enhance novelty prediction performance in relation extraction (RE). RESULTS: Our pioneering NLP system designed for this challenge secured first place in Phase I-NER and second place in Phase II-relation extraction and novelty prediction, outpacing over 200 teams. We tested OpenAI ChatGPT 3.5 and ChatGPT 4 in a Zero-Shot setting using the same test set, revealing that our finetuned model considerably surpasses these broad-spectrum large language models. DISCUSSION AND CONCLUSION: Our outcomes depict a robust NLP system excelling in NER and RE across various biomedical entities, emphasizing that task-specific models remain superior to generic large ones. Such insights are valuable for endeavors like knowledge graph development and hypothesis formulation in biomedical research.

4.
Stud Health Technol Inform ; 310: 639-643, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269887

RESUMO

Automatic extraction of relations between drugs/chemicals and proteins from ever-growing biomedical literature is required to build up-to-date knowledge bases in biomedicine. To promote the development of automated methods, BioCreative-VII organized a shared task - the DrugProt track, to recognize drug-protein entity relations from PubMed abstracts. We participated in the shared task and leveraged deep learning-based transformer models pre-trained on biomedical data to build ensemble approaches to automatically extract drug-protein relation from biomedical literature. On the main corpora of 10,750 abstracts, our best system obtained an F1-score of 77.60% (ranked 4th among 30 participating teams), and on the large-scale corpus of 2.4M documents, our system achieved micro-averaged F1-score of 77.32% (ranked 2nd among 9 system submissions). This demonstrates the effectiveness of domain-specific transformer models and ensemble approaches for automatic relation extraction from biomedical literature.


Assuntos
Fontes de Energia Elétrica , Bases de Conhecimento , PubMed
5.
J Am Med Inform Assoc ; 31(2): 375-385, 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-37952206

RESUMO

OBJECTIVES: We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system. MATERIALS AND METHODS: The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases. RESULTS: AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria. DISCUSSION: AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics. CONCLUSION: AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.


Assuntos
Neoplasias da Mama , Processamento de Linguagem Natural , Humanos , Feminino , Armazenamento e Recuperação da Informação , Neoplasias da Mama/tratamento farmacológico , Idioma , Definição da Elegibilidade/métodos
6.
PeerJ ; 11: e16087, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077442

RESUMO

The Protein Kinase Ontology (ProKinO) is an integrated knowledge graph that conceptualizes the complex relationships among protein kinase sequence, structure, function, and disease in a human and machine-readable format. In this study, we have significantly expanded ProKinO by incorporating additional data on expression patterns and drug interactions. Furthermore, we have developed a completely new browser from the ground up to render the knowledge graph visible and interactive on the web. We have enriched ProKinO with new classes and relationships that capture information on kinase ligand binding sites, expression patterns, and functional features. These additions extend ProKinO's capabilities as a discovery tool, enabling it to uncover novel insights about understudied members of the protein kinase family. We next demonstrate the application of ProKinO. Specifically, through graph mining and aggregate SPARQL queries, we identify the p21-activated protein kinase 5 (PAK5) as one of the most frequently mutated dark kinases in human cancers with abnormal expression in multiple cancers, including a previously unappreciated role in acute myeloid leukemia. We have identified recurrent oncogenic mutations in the PAK5 activation loop predicted to alter substrate binding and phosphorylation. Additionally, we have identified common ligand/drug binding residues in PAK family kinases, underscoring ProKinO's potential application in drug discovery. The updated ontology browser and the addition of a web component, ProtVista, which enables interactive mining of kinase sequence annotations in 3D structures and Alphafold models, provide a valuable resource for the signaling community. The updated ProKinO database is accessible at https://prokino.uga.edu.


Assuntos
Neoplasias , Proteínas Quinases , Humanos , Proteínas Quinases/genética , Ligantes , Proteínas/genética , Fosforilação
7.
PeerJ ; 11: e15815, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37868056

RESUMO

The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied "dark" members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.


Assuntos
Reconhecimento Automatizado de Padrão , Proteínas , Humanos , Proteínas/genética , Biologia Computacional , Aprendizagem , Conhecimento
8.
Sci Signal ; 15(753): eabk1147, 2022 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-36166510

RESUMO

Spinocerebellar ataxia type 14 (SCA14) is a neurodegenerative disease caused by germline variants in the diacylglycerol (DAG)/Ca2+-regulated protein kinase Cγ (PKCγ), leading to Purkinje cell degeneration and progressive cerebellar dysfunction. Most of the identified mutations cluster in the DAG-sensing C1 domains. Here, we found with a FRET-based activity reporter that SCA14-associated PKCγ mutations, including a previously undescribed variant, D115Y, enhanced the basal activity of the kinase by compromising its autoinhibition. Unlike other mutations in PKC that impair its autoinhibition but lead to its degradation, the C1 domain mutations protected PKCγ from such down-regulation. This enhanced basal signaling rewired the brain phosphoproteome, as revealed by phosphoproteomic analysis of cerebella from mice expressing a human SCA14-associated H101Y mutant PKCγ transgene. Mutations that induced a high basal activity in vitro were associated with earlier average age of onset in patients. Furthermore, the extent of disrupted autoinhibition, but not agonist-stimulated activity, correlated with disease severity. Molecular modeling indicated that almost all SCA14 variants not within the C1 domain were located at interfaces with the C1B domain, suggesting that mutations in and proximal to the C1B domain are a susceptibility for SCA14 because they uniquely enhance PKCγ basal activity while protecting the enzyme from down-regulation. These results provide insight into how PKCγ activation is modulated and how deregulation of the cerebellar phosphoproteome by SCA14-associated mutations affects disease progression.


Assuntos
Diglicerídeos , Ataxias Espinocerebelares , Animais , Diglicerídeos/metabolismo , Humanos , Camundongos , Mutação , Proteína Quinase C , Células de Purkinje/metabolismo , Ataxias Espinocerebelares/genética
9.
BMC Bioinformatics ; 22(1): 446, 2021 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-34537014

RESUMO

BACKGROUND: Protein kinases are among the largest druggable family of signaling proteins, involved in various human diseases, including cancers and neurodegenerative disorders. Despite their clinical relevance, nearly 30% of the 545 human protein kinases remain highly understudied. Comparative genomics is a powerful approach for predicting and investigating the functions of understudied kinases. However, an incomplete knowledge of kinase orthologs across fully sequenced kinomes severely limits the application of comparative genomics approaches for illuminating understudied kinases. Here, we introduce KinOrtho, a query- and graph-based orthology inference method that combines full-length and domain-based approaches to map one-to-one kinase orthologs across 17 thousand species. RESULTS: Using multiple metrics, we show that KinOrtho performed better than existing methods in identifying kinase orthologs across evolutionarily divergent species and eliminated potential false positives by flagging sequences without a proper kinase domain for further evaluation. We demonstrate the advantage of using domain-based approaches for identifying domain fusion events, highlighting a case between an understudied serine/threonine kinase TAOK1 and a metabolic kinase PIK3C2A with high co-expression in human cells. We also identify evolutionary fission events involving the understudied OBSCN kinase domains, further highlighting the value of domain-based orthology inference approaches. Using KinOrtho-defined orthologs, Gene Ontology annotations, and machine learning, we propose putative biological functions of several understudied kinases, including the role of TP53RK in cell cycle checkpoint(s), the involvement of TSSK3 and TSSK6 in acrosomal vesicle localization, and potential functions for the ULK4 pseudokinase in neuronal development. CONCLUSIONS: In sum, KinOrtho presents a novel query-based tool to identify one-to-one orthologous relationships across thousands of proteomes that can be applied to any protein family of interest. We exploit KinOrtho here to identify kinase orthologs and show that its well-curated kinome ortholog set can serve as a valuable resource for illuminating understudied kinases, and the KinOrtho framework can be extended to any protein-family of interest.


Assuntos
Evolução Biológica , Genômica , Humanos , Anotação de Sequência Molecular , Proteínas Quinases/genética , Proteínas Serina-Treonina Quinases , Proteínas
10.
BMC Bioinformatics ; 21(1): 520, 2020 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-33183223

RESUMO

BACKGROUND: Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. RESULTS: In this study, we propose a multi-component Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ([Formula: see text]) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein-protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed [Formula: see text] values in cell-based assays. CONCLUSIONS: By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.


Assuntos
Redes Neurais de Computação , Inibidores de Proteínas Quinases/química , Relação Quantitativa Estrutura-Atividade , Afatinib/farmacologia , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Carcinoma Pulmonar de Células não Pequenas/patologia , Linhagem Celular Tumoral , Receptores ErbB/antagonistas & inibidores , Receptores ErbB/genética , Receptores ErbB/metabolismo , Humanos , Lapatinib/farmacologia , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Sistema de Sinalização das MAP Quinases/efeitos dos fármacos , Mutação , Medicina de Precisão , Mapas de Interação de Proteínas/efeitos dos fármacos , Inibidores de Proteínas Quinases/metabolismo , Inibidores de Proteínas Quinases/farmacologia
11.
Elife ; 92020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-32234211

RESUMO

Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.


Carbohydrates are one of the major groups of large biological molecules that regulate nearly all aspects of life. Yet, unlike DNA or proteins, carbohydrates are made without a template to follow. Instead, these molecules are built from a set of sugar-based building blocks by the intricate activities of a large and diverse family of enzymes known as glycosyltransferases. An incomplete understanding of how glycosyltransferases recognize and build diverse carbohydrates presents a major bottleneck in developing therapeutic strategies for diseases associated with abnormalities in these enzymes. It also limits efforts to engineer these enzymes for biotechnology applications and biofuel production. Taujale et al. have now used evolutionary approaches to map the evolution of a major subset of glycosyltransferases from species across the tree of life to understand how these enzymes evolved such precise mechanisms to build diverse carbohydrates. First, a minimal structural unit was defined based on being shared among a group of over half a million unique glycosyltransferase enzymes with different activities. Further analysis then showed that the diverse activities of these enzymes evolved through the accumulation of mutations within this structural unit, as well as in much more variable regions in the enzyme that extend from the minimal unit. Taujale et al. then built an extended family tree for this collection of glycosyltransferases and details of the evolutionary relationships between the enzymes helped them to create a machine learning framework that could predict which sugar-containing molecules were the raw materials for a given glycosyltransferase. This framework could make predictions with nearly 90% accuracy based only on information that can be deciphered from the gene for that enzyme. These findings will provide scientists with new hypotheses for investigating the complex relationships connecting the genetic information about glycosyltransferases with their structures and activities. Further refinement of the machine learning framework may eventually enable the design of enzymes with properties that are desirable for applications in biotechnology.


Assuntos
Glicosiltransferases/química , Dobramento de Proteína , Evolução Molecular , Humanos , Filogenia , Especificidade por Substrato
12.
Sci Rep ; 8(1): 6518, 2018 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-29695735

RESUMO

Many bioinformatics resources with unique perspectives on the protein landscape are currently available. However, generating new knowledge from these resources requires interoperable workflows that support cross-resource queries. In this study, we employ federated queries linking information from the Protein Kinase Ontology, iPTMnet, Protein Ontology, neXtProt, and the Mouse Genome Informatics to identify key knowledge gaps in the functional coverage of the human kinome and prioritize understudied kinases, cancer variants and post-translational modifications (PTMs) for functional studies. We identify 32 functional domains enriched in cancer variants and PTMs and generate mechanistic hypotheses on overlapping variant and PTM sites by aggregating information at the residue, protein, pathway and species level from these resources. We experimentally test the hypothesis that S768 phosphorylation in the C-helix of EGFR is inhibitory by showing that oncogenic variants altering S768 phosphorylation increase basal EGFR activity. In contrast, oncogenic variants altering conserved phosphorylation sites in the 'hydrophobic motif' of PKCßII (S660F and S660C) are loss-of-function in that they reduce kinase activity and enhance membrane translocation. Our studies provide a framework for integrative, consistent, and reproducible annotation of the cancer kinomes.


Assuntos
Mutação/genética , Neoplasias/genética , Proteínas Quinases/genética , Processamento de Proteína Pós-Traducional/genética , Proteínas/genética , Animais , Células CHO , Células COS , Linhagem Celular , Chlorocebus aethiops , Biologia Computacional/métodos , Cricetulus , Ontologia Genética , Variação Genética/genética , Humanos , Camundongos , Fosforilação/genética
13.
J Am Med Inform Assoc ; 23(4): 750-7, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27013523

RESUMO

OBJECTIVE: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. METHODS: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. RESULTS AND DISCUSSION: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy.


Assuntos
Ensaios Clínicos como Assunto , Mineração de Dados , Bases de Conhecimento , Neoplasias/genética , DNA de Neoplasias , Bases de Dados Factuais , Humanos , Mutação , Processamento de Linguagem Natural , Neoplasias/terapia , Medicina de Precisão
14.
Biomed Res Int ; 2015: 491502, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26539502

RESUMO

An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data.


Assuntos
Mutação , Neoplasias Primárias Desconhecidas/classificação , Neoplasias Primárias Desconhecidas/genética , Feminino , Humanos , Intestino Grosso/patologia , Fígado/patologia , Pulmão/patologia , Masculino , Neoplasias Primárias Desconhecidas/patologia , Pâncreas/patologia , Pele/patologia , Máquina de Vetores de Suporte
15.
BMC Syst Biol ; 9 Suppl 4: S2, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26100720

RESUMO

BACKGROUND: Computational pharmacology can uniquely address some issues in the process of drug development by providing a macroscopic view and a deeper understanding of drug action. Specifically, network-assisted approach is promising for the inference of drug repurposing. However, the drug-target associations coming from different sources and various assays have much noise, leading to an inflation of the inference errors. To reduce the inference errors, it is necessary and critical to create a comprehensive and weighted data set of drug-target associations. RESULTS: In this study, we created a weighted and integrated drug-target interactome (WinDTome) to provide a comprehensive resource of drug-target associations for computational pharmacology. We first collected drug-target interactions from six commonly used drug-target centered data sources including DrugBank, KEGG, TTD, MATADOR, PDSP K(i) Database, and BindingDB. Then, we employed the record linkage method to normalize drugs and targets to the unique identifiers by utilizing the public data sources including PubChem, Entrez Gene, and UniProt. To assess the reliability of the drug-target associations, we assigned two scores (Score_S and Score_R) to each drug-target association based on their data sources and publication references. Consequently, the WinDTome contains 546,196 drug-target associations among 303,018 compounds and 4,113 genes. To assess the application of the WinDTome, we designed a network-based approach for drug repurposing using mental disorder schizophrenia (SCZ) as a case. Starting from 41 known SCZ drugs and their targets, we inferred a total of 264 potential SCZ drugs through the associations of drug-target with Score_S higher than two in WinDTome and human protein-protein interactions. Among the 264 SCZ-related drugs, 39 drugs have been investigated in clinical trials for SCZ treatment and 74 drugs for the treatment of other mental disorders, respectively. Compared with the results using other Score_S cutoff values, single data source, or the data from STITCH, the inference of 264 SCZ-related drugs had the highest performance. CONCLUSIONS: The WinDTome generated in this study contains comprehensive drug-target associations with confidence scores. Its application to the SCZ drug repurposing demonstrated that the WinDTome is promising to serve as a useful resource for drug repurposing.


Assuntos
Biologia Computacional/métodos , Reposicionamento de Medicamentos , Terapia de Alvo Molecular , Preparações Farmacêuticas/metabolismo , Esquizofrenia/tratamento farmacológico , Bases de Dados de Produtos Farmacêuticos , Humanos , Ligação Proteica , Proteínas/metabolismo , Esquizofrenia/metabolismo
16.
Biomed Res Int ; 2014: 258784, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24689033

RESUMO

Drug addiction is a chronic and complex brain disease, adding much burden on the community. Though numerous efforts have been made to identify the effective treatment, it is necessary to find more novel therapeutics for this complex disease. As network pharmacology has become a promising approach for drug repurposing, we proposed to apply the approach to drug addiction, which might provide new clues for the development of effective addiction treatment drugs. We first extracted 44 addictive drugs from the NIDA and their targets from DrugBank. Then, we constructed two networks: an addictive drug-target network and an expanded addictive drug-target network by adding other drugs that have at least one common target with these addictive drugs. By performing network analyses, we found that those addictive drugs with similar actions tended to cluster together. Additionally, we predicted 94 nonaddictive drugs with potential pharmacological functions to the addictive drugs. By examining the PubMed data, 51 drugs significantly cooccurred with addictive keywords than expected. Thus, the network analyses provide a list of candidate drugs for further investigation of their potential in addiction treatment or risk.


Assuntos
Reposicionamento de Medicamentos , Previsões/métodos , Drogas Ilícitas/efeitos adversos , Transtornos Relacionados ao Uso de Substâncias/tratamento farmacológico , Análise por Conglomerados , Bases de Dados Factuais , Interações Medicamentosas , Humanos , Drogas Ilícitas/isolamento & purificação , National Institute on Drug Abuse (U.S.) , Transtornos Relacionados ao Uso de Substâncias/prevenção & controle , Estados Unidos
17.
Proteomics ; 13(2): 313-24, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23184540

RESUMO

The prediction of adverse drug reactions (ADRs) has become increasingly important, due to the rising concern on serious ADRs that can cause drugs to fail to reach or stay in the market. We proposed a framework for predicting ADR profiles by integrating protein-protein interaction (PPI) networks with drug structures. We compared ADR prediction performances over 18 ADR categories through four feature groups-only drug targets, drug targets with PPI networks, drug structures, and drug targets with PPI networks plus drug structures. The results showed that the integration of PPI networks and drug structures can significantly improve the ADR prediction performance. The median AUC values for the four groups were 0.59, 0.61, 0.65, and 0.70. We used the protein features in the best two models, "Cardiac disorders" (median-AUC: 0.82) and "Psychiatric disorders" (median-AUC: 0.76), to build ADR-specific PPI networks with literature supports. For validation, we examined 30 drugs withdrawn from the U.S. market to see if our approach can predict their ADR profiles and explain why they were withdrawn. Except for three drugs having ADRs in the categories we did not predict, 25 out of 27 withdrawn drugs (92.6%) having severe ADRs were successfully predicted by our approach.


Assuntos
Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/metabolismo , Preparações Farmacêuticas/química , Farmacologia/métodos , Mapas de Interação de Proteínas , Área Sob a Curva , Química Farmacêutica , Bases de Dados de Proteínas , Cardiopatias , Humanos , Transtornos Mentais , Preparações Farmacêuticas/metabolismo , Proteínas/química , Proteínas/metabolismo , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
18.
BMC Genomics ; 12 Suppl 5: S11, 2011 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-22369493

RESUMO

BACKGROUND: Studies of toxicity and unintended side effects can lead to improved drug safety and efficacy. One promising form of study comes from molecular systems biology in the form of "systems pharmacology". Systems pharmacology combines data from clinical observation and molecular biology. This approach is new, however, and there are few examples of how it can practically predict adverse reactions (ADRs) from an experimental drug with acceptable accuracy. RESULTS: We have developed a new and practical computational framework to accurately predict ADRs of trial drugs. We combine clinical observation data with drug target data, protein-protein interaction (PPI) networks, and gene ontology (GO) annotations. We use cardiotoxicity, one of the major causes for drug withdrawals, as a case study to demonstrate the power of the framework. Our results show that an in silico model built on this framework can achieve a satisfactory cardiotoxicity ADR prediction performance (median AUC = 0.771, Accuracy = 0.675, Sensitivity = 0.632, and Specificity = 0.789). Our results also demonstrate the significance of incorporating prior knowledge, including gene networks and gene annotations, to improve future ADR assessments. CONCLUSIONS: Biomolecular network and gene annotation information can significantly improve the predictive accuracy of ADR of drugs under development. The use of PPI networks can increase prediction specificity and the use of GO annotations can increase prediction sensitivity. Using cardiotoxicity as an example, we are able to further identify cardiotoxicity-related proteins among drug target expanding PPI networks. The systems pharmacology approach that we developed in this study can be generally applicable to all future developmental drug ADR assessments and predictions.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Estatísticos , Doenças Cardiovasculares/etiologia , Doenças Cardiovasculares/genética , Biologia Computacional , Bases de Dados Factuais , Humanos , Modelos Logísticos , Mapeamento de Interação de Proteínas , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA