Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 13(7): e0199094, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29975729

RESUMO

BACKGROUND/AIMS: The seroclearance of hepatitis B virus (HBV) surface antigen (HBsAg) is regarded as a functional cure of chronic hepatitis B (CHB) although it occurs rarely. Recently, several genome-wide association studies (GWASs) revealed various genetic alterations related to the clinical course of HBV infection. However, all of these studies focused on the progression of HBV infection to chronicity and had limited application because of the heterogeneity of HBV genotypes. In the present study, we aimed to determine susceptibility genetic markers for seroclearance of HBsAg in CHB patients with a homogenous viral genotype. METHODS: One hundred patients with CHB who had experienced HBsAg seroclearance before 60 years of age and another 100 with CHB showing high serum levels of HBsAg even after 60 years of age were enrolled. Extreme-phenotype GWAS was conducted using blood samples of participants. RESULTS: We identified three single nucleotide polymorphisms, rs7944135 (P = 4.17 × 10-6, odds ratio [OR] = 4.16, 95% confidence interval [CI] = 2.27-7.63) at 11q12.1, rs171941 (P = 3.52×10-6, OR = 3.69, 95% CI = 2.13-6.42) at 5q14.1, and rs6462008 (P = 3.40×10-6, OR = 0.34, 95% CI = 0.22-0.54) at 7p15.2 as novel susceptibility loci associated with HBsAg seroclearance in patients with CHB. The flanking genes at these loci including MPEG1, DTX4, MTX3, and HOXA13 were suggested to have functional significance. In addition, through functional analysis, CXCL13 was also presumed to be related. CONCLUSIONS: To the best of our knowledge, this study is the first GWAS regarding the seroclearance of HBsAg in CHB patients. We identify new susceptibility loci for cure of CHB, providing new insights into its pathophysiology.


Assuntos
Predisposição Genética para Doença , Antígenos de Superfície da Hepatite B/genética , Vírus da Hepatite B/genética , Hepatite B Crônica/genética , Adulto , Idoso , Quimiocina CXCL13/genética , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Antígenos de Superfície da Hepatite B/sangue , Vírus da Hepatite B/patogenicidade , Hepatite B Crônica/sangue , Hepatite B Crônica/fisiopatologia , Hepatite B Crônica/virologia , Proteínas de Homeodomínio/genética , Humanos , Masculino , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Ubiquitina-Proteína Ligases/genética , Carga Viral/genética
2.
Methods ; 145: 10-15, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29758273

RESUMO

Determining functions of a gene requires time consuming, expensive biological experiments. Scientists can speed up this experimental process if the literature information and biological networks can be adequately provided. In this paper, we present a web-based information system that can perform in silico experiments of computationally testing hypothesis on the function of a gene. A hypothesis that is specified in English by the user is converted to genes using a literature and knowledge mining system called BEST. Condition-specific TF, miRNA and PPI (protein-protein interaction) networks are automatically generated by projecting gene and miRNA expression data to template networks. Then, an in silico experiment is to test how well the target genes are connected from the knockout gene through the condition-specific networks. The test result visualizes path from the knockout gene to the target genes in the three networks. Statistical and information-theoretic scores are provided on the resulting web page to help scientists either accept or reject the hypothesis being tested. Our web-based system was extensively tested using three data sets, such as E2f1, Lrrk2, and Dicer1 knockout data sets. We were able to re-produce gene functions reported in the original research papers. In addition, we comprehensively tested with all disease names in MalaCards as hypothesis to show the effectiveness of our system. Our in silico experiment system can be very useful in suggesting biological mechanisms which can be further tested in vivo or in vitro. AVAILABILITY: http://biohealth.snu.ac.kr/software/insilico/.


Assuntos
Biologia Computacional , Simulação por Computador , Redes Reguladoras de Genes , Animais , Camundongos , MicroRNAs/metabolismo , Mapas de Interação de Proteínas , Fatores de Transcrição/metabolismo
3.
BMC Bioinformatics ; 19(1): 21, 2018 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-29368597

RESUMO

BACKGROUND: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. RESULTS: Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. CONCLUSION: We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.


Assuntos
Resistencia a Medicamentos Antineoplásicos/genética , Ferramenta de Busca , Antineoplásicos/uso terapêutico , Bases de Dados Factuais , Humanos , Mutação , Neoplasias/tratamento farmacológico , Neoplasias/genética , Neoplasias/patologia , Redes Neurais de Computação , Medicina de Precisão
4.
PLoS One ; 12(3): e0174999, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28362846

RESUMO

miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.


Assuntos
Literatura , MicroRNAs/metabolismo , RNA Mensageiro/metabolismo , Regiões 3' não Traduzidas/genética , Regiões 3' não Traduzidas/fisiologia , Algoritmos , Biologia Computacional , Humanos , PubMed , Software
5.
PLoS One ; 11(10): e0164680, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27760149

RESUMO

As the volume of publications rapidly increases, searching for relevant information from the literature becomes more challenging. To complement standard search engines such as PubMed, it is desirable to have an advanced search tool that directly returns relevant biomedical entities such as targets, drugs, and mutations rather than a long list of articles. Some existing tools submit a query to PubMed and process retrieved abstracts to extract information at query time, resulting in a slow response time and limited coverage of only a fraction of the PubMed corpus. Other tools preprocess the PubMed corpus to speed up the response time; however, they are not constantly updated, and thus produce outdated results. Further, most existing tools cannot process sophisticated queries such as searches for mutations that co-occur with query terms in the literature. To address these problems, we introduce BEST, a biomedical entity search tool. BEST returns, as a result, a list of 10 different types of biomedical entities including genes, diseases, drugs, targets, transcription factors, miRNAs, and mutations that are relevant to a user's query. To the best of our knowledge, BEST is the only system that processes free text queries and returns up-to-date results in real time including mutation information in the results. BEST is freely accessible at http://best.korea.ac.kr.


Assuntos
Pesquisa Biomédica , Mineração de Dados/métodos , Resistência a Medicamentos/genética , Mutação , Publicações , Interface Usuário-Computador
6.
Biol Direct ; 11(1): 57, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27776539

RESUMO

MOTIVATION: Transcriptome data from the gene knockout experiment in mouse is widely used to investigate functions of genes and relationship to phenotypes. When a gene is knocked out, it is important to identify which genes are affected by the knockout gene. Existing methods, including differentially expressed gene (DEG) methods, can be used for the analysis. However, existing methods require cutoff values to select candidate genes, which can produce either too many false positives or false negatives. This hurdle can be addressed either by improving the accuracy of gene selection or by providing a method to rank candidate genes effectively, or both. Prioritization of candidate genes should consider the goals or context of the knockout experiment. As of now, there are no tools designed for both selecting and prioritizing genes from the mouse knockout data. Hence, the necessity of a new tool arises. RESULTS: In this study, we present CLIP-GENE, a web service that selects gene markers by utilizing differentially expressed genes, mouse transcription factor (TF) network, and single nucleotide variant information. Then, protein-protein interaction network and literature information are utilized to find genes that are relevant to the phenotypic differences. One of the novel features is to allow researchers to specify their contexts or hypotheses in a set of keywords to rank genes according to the contexts that the user specify. We believe that CLIP-GENE will be useful in characterizing functions of TFs in mouse experiments. AVAILABILITY: http://epigenomics.snu.ac.kr/CLIP-GENE REVIEWERS: This article was reviewed by Dr. Lee and Dr. Pongor.


Assuntos
Biologia Computacional/métodos , Fatores de Transcrição/genética , Transcriptoma , Animais , Internet , Camundongos , Camundongos Knockout , Análise de Sequência com Séries de Oligonucleotídeos
7.
Bioinformatics ; 32(18): 2886-8, 2016 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-27485446

RESUMO

UNLABELLED: We introduce HiPub, a seamless Chrome browser plug-in that automatically recognizes, annotates and translates biomedical entities from texts into networks for knowledge discovery. Using a combination of two different named-entity recognition resources, HiPub can recognize genes, proteins, diseases, drugs, mutations and cell lines in texts, and achieve high precision and recall. HiPub extracts biomedical entity-relationships from texts to construct context-specific networks, and integrates existing network data from external databases for knowledge discovery. It allows users to add additional entities from related articles, as well as user-defined entities for discovering new and unexpected entity-relationships. HiPub provides functional enrichment analysis on the biomedical entity network, and link-outs to external resources to assist users in learning new entities and relations. AVAILABILITY AND IMPLEMENTATION: HiPub and detailed user guide are available at http://hipub.korea.ac.kr CONTACT: kangj@korea.ac.kr, aikchoon.tan@ucdenver.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Curadoria de Dados , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão , Algoritmos , Biologia Computacional/métodos , Genes , Humanos , Preparações Farmacêuticas , Proteínas , PubMed , Ferramenta de Busca
8.
Artigo em Inglês | MEDLINE | ID: mdl-27074804

RESUMO

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance.Database URL:http://infos.korea.ac.kr/bronco.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Doença/genética , Genômica , Mapeamento Cromossômico , Análise Mutacional de DNA , Curadoria de Dados , Humanos
9.
BMC Med Imaging ; 16: 23, 2016 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-26968938

RESUMO

BACKGROUND: Facial palsy or paralysis (FP) is a symptom that loses voluntary muscles movement in one side of the human face, which could be very devastating in the part of the patients. Traditional methods are solely dependent to clinician's judgment and therefore time consuming and subjective in nature. Hence, a quantitative assessment system becomes apparently invaluable for physicians to begin the rehabilitation process; and to produce a reliable and robust method is challenging and still underway. METHODS: We introduce a novel approach for a quantitative assessment of facial paralysis that tackles classification problem for FP type and degree of severity. Specifically, a novel method of quantitative assessment is presented: an algorithm that extracts the human iris and detects facial landmarks; and a hybrid approach combining the rule-based and machine learning algorithm to analyze and prognosticate facial paralysis using the captured images. A method combining the optimized Daugman's algorithm and Localized Active Contour (LAC) model is proposed to efficiently extract the iris and facial landmark or key points. To improve the performance of LAC, appropriate parameters of initial evolving curve for facial features' segmentation are automatically selected. The symmetry score is measured by the ratio between features extracted from the two sides of the face. Hybrid classifiers (i.e. rule-based with regularized logistic regression) were employed for discriminating healthy and unhealthy subjects, FP type classification, and for facial paralysis grading based on House-Brackmann (H-B) scale. RESULTS: Quantitative analysis was performed to evaluate the performance of the proposed approach. Experiments show that the proposed method demonstrates its efficiency. CONCLUSIONS: Facial movement feature extraction on facial images based on iris segmentation and LAC-based key point detection along with a hybrid classifier provides a more efficient way of addressing classification problem on facial palsy type and degree of severity. Combining iris segmentation and key point-based method has several merits that are essential for our real application. Aside from the facial key points, iris segmentation provides significant contribution as it describes the changes of the iris exposure while performing some facial expressions. It reveals the significant difference between the healthy side and the severe palsy side when raising eyebrows with both eyes directed upward, and can model the typical changes in the iris region.


Assuntos
Paralisia Facial/fisiopatologia , Iris/patologia , Algoritmos , Paralisia Facial/diagnóstico , Humanos , Interpretação de Imagem Assistida por Computador/métodos
10.
BMC Bioinformatics ; 17(Suppl 17): 477, 2016 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-28155707

RESUMO

BACKGROUND: The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information. RESULTS: In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets. CONCLUSIONS: Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRAP .


Assuntos
Biologia Computacional/métodos , Expressão Gênica , Redes e Vias Metabólicas , Software , Algoritmos , Humanos , Transcriptoma
11.
12.
Bioinformatics ; 31(18): 3069-71, 2015 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-25990557

RESUMO

UNLABELLED: We report the creation of Drug Signatures Database (DSigDB), a new gene set resource that relates drugs/compounds and their target genes, for gene set enrichment analysis (GSEA). DSigDB currently holds 22 527 gene sets, consists of 17 389 unique compounds covering 19 531 genes. We also developed an online DSigDB resource that allows users to search, view and download drugs/compounds and gene sets. DSigDB gene sets provide seamless integration to GSEA software for linking gene expressions with drugs/compounds for drug repurposing and translational research. AVAILABILITY AND IMPLEMENTATION: DSigDB is freely available for non-commercial use at http://tanlab.ucdenver.edu/DSigDB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: aikchoon.tan@ucdenver.edu.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Produtos Farmacêuticos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Neoplasias Pulmonares/genética , Inibidores de Proteínas Quinases/farmacologia , Software , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Reposicionamento de Medicamentos , Receptores ErbB/antagonistas & inibidores , Receptores ErbB/genética , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Mutação/genética
13.
J Clin Neurol ; 11(2): 142-8, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25851892

RESUMO

BACKGROUND AND PURPOSE: Recent advances in information technology have created opportunities for advances in the management of stroke. The objective of this study was to test the feasibility of using a smartphone software application (app) for the management of vascular risk factors in patients with stroke. METHODS: This prospective clinical trial developed a smartphone app, the 'Korea University Health Monitoring System for Stroke: KUHMS2,' for use by patients with stroke. During a 6-month follow-up period, its feasibility was assessed by measuring the changes in their vascular risk-factor profiles and the number of days per patient with data registration into the app. The effect of the app on the achievement rate of risk-factor targets was assessed by classifying subjects into compliant and noncompliant groups. RESULTS: At the end of the trial, data on 48 patients were analyzed. The number of days on which data were registered into the app was 60.42±50.17 (mean±standard deviation). Among predefined vascular risk factors, the target achievement rate for blood pressure and glycated hemoglobin (Hb(A1c)) improved significantly from baseline to the final measurement. The serial changes in achievement rates for risk-factor targets did not differ between the compliant and noncompliant groups. CONCLUSIONS: Many challenges must be overcome before mobile apps can be used for patients with stroke. Nevertheless, the app tested in this study induced a shift in the risk profiles in a favorable direction among the included stroke patients.

14.
Bioinformatics ; 30(1): 135-6, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24149052

RESUMO

SUMMARY: Biomedical Entity-Relationship eXplorer (BEReX) is a new biomedical knowledge integration, search and exploration tool. BEReX integrates eight popular databases (STRING, DrugBank, KEGG, PhamGKB, BioGRID, GO, HPRD and MSigDB) and delineates an integrated network by combining the information available from these databases. Users search the integrated network by entering key words, and BEReX returns a sub-network matching the key words. The resulting graph can be explored interactively. BEReX allows users to find the shortest paths between two remote nodes, find the most relevant drugs, diseases, pathways and so on related to the current network, expand the network by particular types of entities and relations and modify the network by removing or adding selected nodes. BEReX is implemented as a standalone Java application. AVAILABILITY AND IMPLEMENTATION: BEReX and a detailed user guide are available for download at our project Web site (http://infos.korea.ac.kr/berex).


Assuntos
Interface Usuário-Computador , Algoritmos , Tecnologia Biomédica , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Redes Neurais de Computação
15.
BMC Med Inform Decis Mak ; 13 Suppl 1: S7, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23566263

RESUMO

BACKGROUND: Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on "per-instance" precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in practice, correctly identifying not all but a small subset of them would often suffice to detect the given interaction. METHODS: In this regard, we propose a more pragmatic "per-relation" basis performance evaluation method instead of the conventional per-instance basis method. In the per-relation basis method, only a subset of a relation's instances needs to be correctly identified to make the relation positive. In this work, we also introduce a new high-precision rule-based PPI extraction algorithm. While virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall, in many realistic scenarios involving large corpora, one can benefit more from a high-precision algorithm than a high-recall counterpart. RESULTS: We show that our algorithm not only achieves better per-relation performance than previous solutions but also serves as a good complement to the existing PPI extraction tools. Our algorithm improves the performance of the existing tools through simple pipelining. CONCLUSION: The significance of this research can be found in that this research brought new perspective to the performance evaluation of PPI extraction studies, which we believe is more important in practice than existing evaluation criteria. Given the new evaluation perspective, we also showed the importance of a high-precision extraction tool and validated the efficacy of our rule-based system as the high-precision tool candidate.


Assuntos
Biologia Computacional/normas , Técnicas de Apoio para a Decisão , Armazenamento e Recuperação da Informação/métodos , Mapeamento de Interação de Proteínas/normas , Humanos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão
16.
Int J Data Min Bioinform ; 6(5): 535-56, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23155781

RESUMO

The recent advance in SNP genotyping has made a significant contribution to reduction of the costs for large-scale genotyping. The development also has dramatically increased the size of the SNP genotype data. The increase in the volume of the data, however, has posed a huge obstacle to the conventional analysis techniques that are typically vulnerable to the high-dimensionality problem. To address the issue, we propose a method that exploits two well-tested models: the document-term model and the transaction analysis model. The proposed method consists of two phases. In the first phase, we reduce the dimensions of the SNP genotype data by extracting significant SNPs through transformation of the data in lieu of the document-term model. In the second phase, we discover the association rules that signify the relations between the SNPs and the traits, through the application of transactional analysis in the reduced-dimension genotype data. We validated the discovered rules through literature survey. Experiments were also carried out using the HGDP panel data provided by the Foundation Jean Dausset-CEPH, which prove the validity of our new method for identifying appropriate dimensional reduction and associations of multiple SNPs and traits. This paper is an extended version of our workshop paper presented in the 2010 International Workshop on Data Mining for High Throughput Data from Genome-Wide Association Studies.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Genótipo , Mineração de Dados/métodos , Genoma Humano , Humanos , Polimorfismo de Nucleotídeo Único
17.
BMC Med Inform Decis Mak ; 12 Suppl 1: S7, 2012 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-22595092

RESUMO

BACKGROUND: There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. METHODS: Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. RESULTS: The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. CONCLUSION: BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information.


Assuntos
Melhoramento Biomédico , Ferramenta de Busca/métodos , Semântica , Indexação e Redação de Resumos/normas , Indexação e Redação de Resumos/estatística & dados numéricos , Indexação e Redação de Resumos/tendências , Humanos , Reprodutibilidade dos Testes
18.
PLoS One ; 5(12): e14305, 2010 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-21200431

RESUMO

Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN).


Assuntos
Regulação Neoplásica da Expressão Gênica , Neoplasias/classificação , Neoplasias/diagnóstico , Algoritmos , Teorema de Bayes , Biomarcadores Tumorais , Biologia Computacional , Humanos , Modelos Biológicos , Modelos Genéticos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA