Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
1.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38180829

RESUMO

Forecasting the interaction between compounds and proteins is crucial for discovering new drugs. However, previous sequence-based studies have not utilized three-dimensional (3D) information on compounds and proteins, such as atom coordinates and distance matrices, to predict binding affinity. Furthermore, numerous widely adopted computational techniques have relied on sequences of amino acid characters for protein representations. This approach may constrain the model's ability to capture meaningful biochemical features, impeding a more comprehensive understanding of the underlying proteins. Here, we propose a two-step deep learning strategy named MulinforCPI that incorporates transfer learning techniques with multi-level resolution features to overcome these limitations. Our approach leverages 3D information from both proteins and compounds and acquires a profound understanding of the atomic-level features of proteins. Besides, our research highlights the divide between first-principle and data-driven methods, offering new research prospects for compound-protein interaction tasks. We applied the proposed method to six datasets: Davis, Metz, KIBA, CASF-2016, DUD-E and BindingDB, to evaluate the effectiveness of our approach.


Assuntos
Aminoácidos , Mapeamento de Interação de Proteínas , Conformação Proteica , Ligação Proteica
2.
Bioinformatics ; 40(Supplement_1): i119-i129, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940167

RESUMO

SUMMARY: Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Similarly, Self-BioRAG outperforms RAG by 8% Rouge-1 score in generating more proficient answers on two long-form question-answering benchmarks on average. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains. AVAILABILITY AND IMPLEMENTATION: Self-BioRAG is available at https://github.com/dmis-lab/self-biorag.


Assuntos
Armazenamento e Recuperação da Informação , Humanos , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural
3.
Bioinformatics ; 40(Supplement_1): i369-i380, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940143

RESUMO

MOTIVATION: Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts in molecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios. RESULTS: Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates. AVAILABILITY AND IMPLEMENTATION: The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA.


Assuntos
Software , Aprendizado de Máquina , Estrutura Molecular , Algoritmos , Desenvolvimento de Medicamentos/métodos
4.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37261870

RESUMO

SUMMARY: Biomedical named entity recognition (NER) plays a crucial role in extracting information from documents in biomedical applications. However, many of these applications require NER models to operate at a document level, rather than just a sentence level. This presents a challenge, as the extension from a sentence model to a document model is not always straightforward. Despite the existence of document NER models that are able to make consistent predictions, they still fall short of meeting the expectations of researchers and practitioners in the field. To address this issue, we have undertaken an investigation into the underlying causes of inconsistent predictions. Our research has led us to believe that the use of adjectives and prepositions within entities may be contributing to low label consistency. In this article, we present our method, ConNER, to enhance a label consistency of modifiers such as adjectives and prepositions. By refining the labels of these modifiers, ConNER is able to improve representations of biomedical entities. The effectiveness of our method is demonstrated on four popular biomedical NER datasets. On three datasets, we achieve a higher F1 score than the previous state-of-the-art model. Our method shows its efficacy on two datasets, resulting in 7.5%-8.6% absolute improvements in the F1 score. Our findings suggest that our ConNER method is effective on datasets with intrinsically low label consistency. Through qualitative analysis, we demonstrate how our approach helps the NER model generate more consistent predictions. AVAILABILITY AND IMPLEMENTATION: Our code and resources are available at https://github.com/dmis-lab/ConNER/.


Assuntos
Mineração de Dados , Idioma , Humanos , Mineração de Dados/métodos , Pesquisadores
5.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36416124

RESUMO

MOTIVATION: Compound-protein interaction (CPI) plays an essential role in drug discovery and is performed via expensive molecular docking simulations. Many artificial intelligence-based approaches have been proposed in this regard. Recently, two types of models have accomplished promising results in exploiting molecular information: graph convolutional neural networks that construct a learned molecular representation from a graph structure (atoms and bonds), and neural networks that can be applied to compute on descriptors or fingerprints of molecules. However, the superiority of one method over the other is yet to be determined. Modern studies have endeavored to aggregate information that is extracted from compounds and proteins to form the CPI task. Nonetheless, these approaches have used a simple concatenation to combine them, which cannot fully capture the interaction between such information. RESULTS: We propose the Perceiver CPI network, which adopts a cross-attention mechanism to improve the learning ability of the representation of drug and target interactions and exploits the rich information obtained from extended-connectivity fingerprints to improve the performance. We evaluated Perceiver CPI on three main datasets, Davis, KIBA and Metz, to compare the performance of our proposed model with that of state-of-the-art methods. The proposed method achieved satisfactory performance and exhibited significant improvements over previous approaches in all experiments. AVAILABILITY AND IMPLEMENTATION: Perceiver CPI is available at https://github.com/dmis-lab/PerceiverCPI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Simulação de Acoplamento Molecular , Proteínas/química , Mapas de Interação de Proteínas
6.
Bioinformatics ; 39(39 Suppl 1): i448-i457, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387164

RESUMO

MOTIVATION: Protein-ligand binding affinity prediction is a central task in drug design and development. Cross-modal attention mechanism has recently become a core component of many deep learning models due to its potential to improve model explainability. Non-covalent interactions (NCIs), one of the most critical domain knowledge in binding affinity prediction task, should be incorporated into protein-ligand attention mechanism for more explainable deep drug-target interaction models. We propose ArkDTA, a novel deep neural architecture for explainable binding affinity prediction guided by NCIs. RESULTS: Experimental results show that ArkDTA achieves predictive performance comparable to current state-of-the-art models while significantly improving model explainability. Qualitative investigation into our novel attention mechanism reveals that ArkDTA can identify potential regions for NCIs between candidate drug compounds and target proteins, as well as guiding internal operations of the model in a more interpretable and domain-aware manner. AVAILABILITY: ArkDTA is available at https://github.com/dmis-lab/ArkDTA. CONTACT: kangj@korea.ac.kr.


Assuntos
Sistemas de Liberação de Medicamentos , Desenho de Fármacos , Ligantes
7.
Bioinformatics ; 38(15): 3794-3801, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35713500

RESUMO

MOTIVATION: Current studies in extractive question answering (EQA) have modeled the single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the general domain can be answered with a single span. Following general domain EQA models, current biomedical EQA (BioEQA) models utilize the single-span extraction setting with post-processing steps. RESULTS: In this article, we investigate the question distribution across the general and biomedical domains and discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer). This necessitates the models capable of producing multiple answers for a question. Based on this preliminary study, we propose a sequence tagging approach for BioEQA, which is a multi-span extraction setting. Our approach directly tackles questions with a variable number of phrases as their answer and can learn to decide the number of answers for a question from training data. Our experimental results on the BioASQ 7b and 8b list-type questions outperformed the best-performing existing models without requiring post-processing steps. AVAILABILITY AND IMPLEMENTATION: Source codes and resources are freely available for download at https://github.com/dmis-lab/SeqTagQA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Software
8.
Bioinformatics ; 38(20): 4837-4839, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36053172

RESUMO

In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g. diseases and drugs) from the ever-growing biomedical literature. In this article, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. We hope that our tool can help annotate large-scale biomedical texts for various tasks such as biomedical knowledge graph construction. AVAILABILITY AND IMPLEMENTATION: Web service of BERN2 is publicly available at http://bern2.korea.ac.kr. We also provide local installation of BERN2 at https://github.com/dmis-lab/BERN2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Software , Processamento de Linguagem Natural
9.
Bioinformatics ; 37(Suppl_1): i376-i382, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252937

RESUMO

MOTIVATION: Identifying mechanism of actions (MoA) of novel compounds is crucial in drug discovery. Careful understanding of MoA can avoid potential side effects of drug candidates. Efforts have been made to identify MoA using the transcriptomic signatures induced by compounds. However, these approaches fail to reveal MoAs in the absence of actual compound signatures. RESULTS: We present MoAble, which predicts MoAs without requiring compound signatures. We train a deep learning-based coembedding model to map compound signatures and compound structure into the same embedding space. The model generates low-dimensional compound signature representation from the compound structures. To predict MoAs, pathway enrichment analysis is performed based on the connectivity between embedding vectors of compounds and those of genetic perturbation. Results show that MoAble is comparable to the methods that use actual compound signatures. We demonstrate that MoAble can be used to reveal MoAs of novel compounds without measuring compound signatures with the same prediction accuracy as that with measuring them. AVAILABILITY AND IMPLEMENTATION: MoAble is available at https://github.com/dmis-lab/moable. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Transcriptoma , Descoberta de Drogas
10.
PLoS Comput Biol ; 17(9): e1009302, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34520464

RESUMO

A continuing challenge in modern medicine is the identification of safer and more efficacious drugs. Precision therapeutics, which have one molecular target, have been long promised to be safer and more effective than traditional therapies. This approach has proven to be challenging for multiple reasons including lack of efficacy, rapidly acquired drug resistance, and narrow patient eligibility criteria. An alternative approach is the development of drugs that address the overall disease network by targeting multiple biological targets ('polypharmacology'). Rational development of these molecules will require improved methods for predicting single chemical structures that target multiple drug targets. To address this need, we developed the Multi-Targeting Drug DREAM Challenge, in which we challenged participants to predict single chemical entities that target pro-targets but avoid anti-targets for two unrelated diseases: RET-based tumors and a common form of inherited Tauopathy. Here, we report the results of this DREAM Challenge and the development of two neural network-based machine learning approaches that were applied to the challenge of rational polypharmacology. Together, these platforms provide a potentially useful first step towards developing lead therapeutic compounds that address disease complexity through rational polypharmacology.


Assuntos
Desenvolvimento de Medicamentos , Neoplasias/tratamento farmacológico , Inibidores de Proteínas Quinases/farmacologia , Proteínas Proto-Oncogênicas c-ret/antagonistas & inibidores , Tauopatias/tratamento farmacológico , Humanos , Neoplasias/metabolismo , Redes Neurais de Computação , Polifarmacologia , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/uso terapêutico , Proteínas Proto-Oncogênicas c-ret/genética , Proteínas Proto-Oncogênicas c-ret/metabolismo , Proteínas tau/genética , Proteínas tau/metabolismo
11.
Nucleic Acids Res ; 48(D1): D817-D824, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31680157

RESUMO

Fusion genes represent an important class of biomarkers and therapeutic targets in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data (ChimerSeq) and text mining of publications (ChimerPub) with extensive manual annotations (ChimerKB). In this update, we present all three modules substantially enhanced by incorporating the recent flood of deep sequencing data and related publications. ChimerSeq now covers all 10 565 patients in the TCGA project, with compilation of computational results from two reliable programs of STAR-Fusion and FusionScan with several public resources. In sum, ChimerSeq includes 65 945 fusion candidates, 21 106 of which were predicted by multiple programs (ChimerSeq-Plus). ChimerPub has been upgraded by applying a deep learning method for text mining followed by extensive manual curation, which yielded 1257 fusion genes including 777 cases with experimental supports (ChimerPub-Plus). ChimerKB includes 1597 fusion genes with publication support, experimental evidences and breakpoint information. Importantly, we implemented several new features to aid estimation of functional significance, including the fusion structure viewer with domain information, gene expression plot of fusion positive versus negative patients and a STRING network viewer. The user interface also was greatly enhanced by applying responsive web design. ChimerDB 4.0 is available at http://www.kobic.re.kr/chimerdb/.


Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional , Gerenciamento de Dados , Bases de Dados Genéticas , Neoplasias/genética , Mineração de Dados , Humanos , Neoplasias/terapia , Software , Interface Usuário-Computador
12.
Bioinformatics ; 36(Suppl_1): i389-i398, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657401

RESUMO

MOTIVATION: Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amount of available cancer patient samples, deep-learning models are prone to overfitting. To address the issue, we introduce a new deep-learning architecture called VAECox. VAECox uses transfer learning and fine tuning. RESULTS: We pre-trained a variational autoencoder on all RNA-seq data in 20 TCGA datasets and transferred the trained weights to our survival prediction model. Then we fine-tuned the transferred weights during training the survival model on each dataset. Results show that our model outperformed other previous models such as Cox Proportional Hazard with LASSO and ridge penalty and Cox-nnet on the 7 of 10 TCGA datasets in terms of C-index. The results signify that the transferred information obtained from entire cancer transcriptome data helped our survival prediction model reduce overfitting and show robust performance in unseen cancer patient samples. AVAILABILITY AND IMPLEMENTATION: Our implementation of VAECox is available at https://github.com/dmis-lab/VAECox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Neoplasias , Genoma , Genômica , Humanos , Neoplasias/genética , Análise de Sobrevida
13.
Bioinformatics ; 36(4): 1234-1240, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31501885

RESUMO

MOTIVATION: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. RESULTS: We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. AVAILABILITY AND IMPLEMENTATION: We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Idioma , Software
14.
Int J Mol Sci ; 22(16)2021 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-34445802

RESUMO

Osteoporosis is commonly treated via the long-term usage of anti-osteoporotic agents; however, poor drug compliance and undesirable side effects limit their treatment efficacy. The parathyroid hormone-related protein (PTHrP) is essential for normal bone formation and remodeling; thus, may be used as an anti-osteoporotic agent. Here, we developed a platform for the delivery of a single peptide composed of two regions of the PTHrP protein (1-34 and 107-139); mcPTHrP 1-34+107-139 using a minicircle vector. We also transfected mcPTHrP 1-34+107-139 into human mesenchymal stem cells (MSCs) and generated Thru 1-34+107-139-producing engineered MSCs (eMSCs) as an alternative delivery system. Osteoporosis was induced in 12-week-old C57BL/6 female mice via ovariectomy. The ovariectomized (OVX) mice were then treated with the two systems; (1) mcPTHrP 1-34+107-139 was intravenously administered three times (once per week); (2) eMSCs were intraperitoneally administered twice (on weeks four and six). Compared with the control OVX mice, the mcPTHrP 1-34+107-139-treated group showed better trabecular bone structure quality, increased bone formation, and decreased bone resorption. Similar results were observed in the eMSCs-treated OVX mice. Altogether, these results provide experimental evidence to support the potential of delivering PTHrP 1-34+107-139 using the minicircle technology for the treatment of osteoporosis.


Assuntos
Reabsorção Óssea/tratamento farmacológico , DNA/administração & dosagem , Osteogênese/efeitos dos fármacos , Proteína Relacionada ao Hormônio Paratireóideo/administração & dosagem , Animais , Densidade Óssea/efeitos dos fármacos , Linhagem Celular , Feminino , Células HEK293 , Humanos , Injeções Intravenosas/métodos , Células-Tronco Mesenquimais/efeitos dos fármacos , Camundongos , Camundongos Endogâmicos C57BL , Osteoporose/tratamento farmacológico , Ovariectomia/métodos
15.
Bioinformatics ; 35(24): 5249-5256, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31116384

RESUMO

MOTIVATION: Traditional drug discovery approaches identify a target for a disease and find a compound that binds to the target. In this approach, structures of compounds are considered as the most important features because it is assumed that similar structures will bind to the same target. Therefore, structural analogs of the drugs that bind to the target are selected as drug candidates. However, even though compounds are not structural analogs, they may achieve the desired response. A new drug discovery method based on drug response, which can complement the structure-based methods, is needed. RESULTS: We implemented Siamese neural networks called ReSimNet that take as input two chemical compounds and predicts the CMap score of the two compounds, which we use to measure the transcriptional response similarity of the two compounds. ReSimNet learns the embedding vector of a chemical compound in a transcriptional response space. ReSimNet is trained to minimize the difference between the cosine similarity of the embedding vectors of the two compounds and the CMap score of the two compounds. ReSimNet can find pairs of compounds that are similar in response even though they may have dissimilar structures. In our quantitative evaluation, ReSimNet outperformed the baseline machine learning models. The ReSimNet ensemble model achieves a Pearson correlation of 0.518 and a precision@1% of 0.989. In addition, in the qualitative analysis, we tested ReSimNet on the ZINC15 database and showed that ReSimNet successfully identifies chemical compounds that are relevant to a prototype drug whose mechanism of action is known. AVAILABILITY AND IMPLEMENTATION: The source code and the pre-trained weights of ReSimNet are available at https://github.com/dmis-lab/ReSimNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Neurais de Computação , Software , Descoberta de Drogas , Aprendizado de Máquina
16.
BMC Bioinformatics ; 20(Suppl 10): 249, 2019 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-31138109

RESUMO

BACKGROUND: Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources and each dataset covers only a small subset of entity types. Furthermore, many bio entities are polysemous, which is one of the major obstacles in named entity recognition. RESULTS: To address the lack of data and the entity type misclassification problem, we propose CollaboNet which utilizes a combination of multiple NER models. In CollaboNet, models trained on a different dataset are connected to each other so that a target model obtains information from other collaborator models to reduce false positives. Every model is an expert on their target entity type and takes turns serving as a target and a collaborator model during training time. The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words. CollaboNet achieved state-of-the-art performance in terms of precision, recall and F1 score. CONCLUSIONS: We demonstrated the benefits of combining multiple models for BioNER. Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types. Given the state-of-the-art performance of our model, we believe that CollaboNet can improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Animais , Mineração de Dados , Humanos , Camundongos , Modelos Teóricos
17.
Methods ; 145: 10-15, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29758273

RESUMO

Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. AVAILABILITY: http://biohealth.snu.ac.kr/software/insilico/.


Assuntos
Biologia Computacional , Simulação por Computador , Redes Reguladoras de Genes , Animais , Camundongos , MicroRNAs/metabolismo , Mapas de Interação de Proteínas , Fatores de Transcrição/metabolismo
18.
BMC Med Imaging ; 19(1): 30, 2019 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-31023253

RESUMO

BACKGROUND: Facial paralysis (FP) is a neuromotor dysfunction that losses voluntary muscles movement in one side of the human face. As the face is the basic means of social interactions and emotional expressions among humans, individuals afflicted can often be introverted and may develop psychological distress, which can be even more severe than the physical disability. This paper addresses the problem of objective facial paralysis evaluation. METHODS: We present a novel approach for objective facial paralysis evaluation and classification, which is crucial for deciding the medical treatment scheme. For FP classification, in particular, we proposed a method based on the ensemble of regression trees to efficiently extract facial salient points and detect iris or sclera boundaries. We also employ 2nd degree polynomial of parabolic function to improve Daugman's algorithm for detecting occluded iris boundaries, thereby allowing us to efficiently get the area of the iris. The symmetry score of each face is measured by calculating the ratio of both iris area and the distances between the key points in both sides of the face. We build a model by employing hybrid classifier that discriminates healthy from unhealthy subjects and performs FP classification. RESULTS: Objective analysis was conducted to evaluate the performance of the proposed method. As we explore the effect of data augmentation using publicly available datasets of facial expressions, experiments reveal that the proposed approach demonstrates efficiency. CONCLUSIONS: Extraction of iris and facial salient points on images based on ensemble of regression trees along with our hybrid classifier (classification tree plus regularized logistic regression) provides a more improved way of addressing FP classification problem. It addresses the common limiting factor introduced in the previous works, i.e. having the greater sensitivity to subjects exposed to peculiar facial images, whereby improper identification of initial evolving curve for facial feature segmentation results to inaccurate facial feature extraction. Leveraging ensemble of regression trees provides accurate salient points extraction, which is crucial for revealing the significant difference between the healthy and the palsy side when performing different facial expressions.


Assuntos
Paralisia Facial/classificação , Interpretação de Imagem Assistida por Computador/métodos , Algoritmos , Paralisia Facial/psicologia , Humanos , Introversão Psicológica , Análise de Regressão , Sensibilidade e Especificidade
19.
Nucleic Acids Res ; 45(D1): D784-D789, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899563

RESUMO

Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/.


Assuntos
Mineração de Dados , Bases de Dados Genéticas , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética , Transcriptoma , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Humanos , Software , Interface Usuário-Computador
20.
BMC Bioinformatics ; 19(1): 21, 2018 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-29368597

RESUMO

BACKGROUND: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. RESULTS: Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. CONCLUSION: We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.


Assuntos
Resistencia a Medicamentos Antineoplásicos/genética , Ferramenta de Busca , Antineoplásicos/uso terapêutico , Bases de Dados Factuais , Humanos , Mutação , Neoplasias/tratamento farmacológico , Neoplasias/genética , Neoplasias/patologia , Redes Neurais de Computação , Medicina de Precisão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA