Search | Nursing VHL Search Portal

Interactive Machine Learning by Visualization: A Small Data Solution.

Li, Huang; Fang, Shiaofen; Mukhopadhyay, Snehasis; Saykin, Andrew J; Shen, Li.

Proc IEEE Int Conf Big Data ; 2018: 3513-3521, 2018 Dec.

Article in English | MEDLINE | ID: mdl-31061990

ABSTRACT

Machine learning algorithms and traditional data mining process usually require a large volume of data to train the algorithm-specific models, with little or no user feedback during the model building process. Such a "big data" based automatic learning strategy is sometimes unrealistic for applications where data collection or processing is very expensive or difficult, such as in clinical trials. Furthermore, expert knowledge can be very valuable in the model building process in some fields such as biomedical sciences. In this paper, we propose a new visual analytics approach to interactive machine learning and visual data mining. In this approach, multi-dimensional data visualization techniques are employed to facilitate user interactions with the machine learning and mining process. This allows dynamic user feedback in different forms, such as data selection, data labeling, and data correction, to enhance the efficiency of model building. In particular, this approach can significantly reduce the amount of data required for training an accurate model, and therefore can be highly impactful for applications where large amount of data is hard to obtain. The proposed approach is tested on two application problems: the handwriting recognition (classification) problem and the human cognitive score prediction (regression) problem. Both experiments show that visualization supported interactive machine learning and data mining can achieve the same accuracy as an automatic process can with much smaller training data sets.

Identification of biological relationships from text documents using efficient computational methods.

Palakal, Mathew; Stephens, Matthew; Mukhopadhyay, Snehasis; Raje, Rajeev; Rhodes, Simon.

J Bioinform Comput Biol ; 1(2): 307-42, 2003 Jul.

Article in English | MEDLINE | ID: mdl-15290775

ABSTRACT

The biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research and development. The current practices of manually searching for information and extracting pertinent knowledge are tedious, time-consuming tasks even for motivated biological researchers. Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. The term "object" refers to any biological entity such as a protein, gene, cell cycle, etc. and relationship refers to any dynamic action one object has on another, e.g. protein inhibiting another protein or one object belonging to another object such as, the cells composing an organ. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For the thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81 percent. These results are promising for multi-object identification and relationship finding from biological documents.

Subject(s)

Artificial Intelligence , Biopolymers/metabolism , Cell Physiological Phenomena , Computational Biology/methods , Information Storage and Retrieval/methods , Natural Language Processing , Periodicals as Topic , Algorithms , Biology/methods , Documentation , Models, Biological , Terminology as Topic , Vocabulary, Controlled

Feature decomposition architectures for neural networks: algorithms, error bounds, and applications.

Wang, Haiying; Mukhopadhyay, Snehasis; Fang, Shiaofen.

Int J Neural Syst ; 12(1): 69-81, 2002 Feb.

Article in English | MEDLINE | ID: mdl-11852445

ABSTRACT

In recent years, systems consisting of multiple modular neural networks have attracted substantial interest in the neural networks community because of various advantages they offer over a single large monolithic network. In this paper, we propose two basic feature decomposition models (namely, parallel model and tandem model) in which each of the neural network modules processes a disjoint subset of the input features. A novel feature decomposition algorithm is introduced to partition the input space into disjoint subsets solely based on the available training data. Under certain assumptions, the approximation error due to decomposition can be proved to be bounded by any desired small value over a compact set. Finally, the performance of feature decomposition networks is compared with that of a monolithic network in real world bench mark pattern recognition and modeling problems.

Subject(s)

Algorithms , Neural Networks, Computer , Speech Perception

Decentralized indirect methods for learning automata games.

Tilak, Omkar; Martin, Ryan; Mukhopadhyay, Snehasis.

IEEE Trans Syst Man Cybern B Cybern ; 41(5): 1213-23, 2011 Oct.

Article in English | MEDLINE | ID: mdl-21925998

ABSTRACT

We discuss the application of indirect learning methods in zero-sum and identical payoff learning automata games. We propose a novel decentralized version of the well-known pursuit learning algorithm. Such a decentralized algorithm has significant computational advantages over its centralized counterpart. The theoretical study of such a decentralized algorithm requires the analysis to be carried out in a nonstationary environment. We use a novel bootstrapping argument to prove the convergence of the algorithm. To our knowledge, this is the first time that such analysis has been carried out for zero-sum and identical payoff games. Extensive simulation studies are reported, which demonstrate the proposed algorithm's fast and accurate convergence in a variety of game scenarios. We also introduce the framework of partial communication in the context of identical payoff games of learning automata. In such games, the automata may not communicate with each other or may communicate selectively. This comprehensive framework has the capability to model both centralized and decentralized games discussed in this paper.

Subject(s)

Algorithms , Artificial Intelligence , Cybernetics , Game Theory , Cluster Analysis , Computer Simulation

Multi-way association extraction and visualization from biological text documents using hyper-graphs: applications to genetic association studies for diseases.

Mukhopadhyay, Snehasis; Palakal, Mathew; Maddu, Kalyan.

Artif Intell Med ; 49(3): 145-54, 2010 Jul.

Article in English | MEDLINE | ID: mdl-20382004

ABSTRACT

OBJECTIVES: Biological research literature, as in many other domains of human endeavor, represents a rich, ever growing source of knowledge. An important form of such biological knowledge constitutes associations among biological entities such as genes, proteins, diseases, drugs and chemicals, etc. There has been a considerable amount of recent research in extraction of various kinds of binary associations (e.g., gene-gene, gene-protein, protein-protein, etc.) using different text mining approaches. However, an important aspect of such associations (e.g., "gene A activates protein B") is identifying the context in which such associations occur (e.g., "gene A activates protein B in the context of disease C in organ D under the influence of chemical E"). Such contexts can be represented appropriately by a multi-way relationship involving more than two objects (e.g., objects A, B, C, D, E) rather than usual binary relationship (objects A and B). METHODS: Such multi-way relations naturally lead to a hyper-graph representation of the knowledge rather than a binary graph. The hyper-graph based multi-way knowledge extraction from biological text literature represents a computationally difficult problem (due to its combinatorial nature) which has not received much attention from the Bioinformatics research community. In this paper, we describe and compare two different approaches to such multi-way hyper-graph extraction: one based on an exhaustive enumeration of all multi-way hyper-edges and the other based on an extension of the well-known A Priori algorithm for structured data to the case unstructured textual data. We also present a representative graph based approach towards visualizing these genetic association hyper-graphs. RESULTS: Two case studies are conducted for two biomedical problems (related to the diseases of lung cancer and colorectal cancer respectively), illustrating that the latter approach (using the text-based A Priori method) identifies the same hyper-edges as the former approach (the exhaustive method), but at a much less computational cost. The extracted hyper-relations are presented in the paper as cognition-rich representative graphs, representing the corresponding hyper-graphs. CONCLUSIONS: The text-based A Priori algorithm is a practical, useful method to extract hyper-graphs representing multi-way associations among biological objects. These hyper-graphs and their visualization using representative graphs can provide important contextual information for understanding gene-gene associations relevant to specific diseases.

Subject(s)

Computer Graphics , Disease/genetics , Genome-Wide Association Study , Computational Biology , Humans

Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species.

Kumpatla, Siva P; Mukhopadhyay, Snehasis.

Genome ; 48(6): 985-98, 2005 Dec.

Article in English | MEDLINE | ID: mdl-16391668

ABSTRACT

Simple sequence repeat (SSR) markers are widely used in many plant and animal genomes due to their abundance, hypervariability, and suitability for high-throughput analysis. Development of SSR markers using molecular methods is time consuming, laborious, and expensive. Use of computational approaches to mine ever-increasing sequences such as expressed sequence tags (ESTs) in public databases permits rapid and economical discovery of SSRs. Most of such efforts to date focused on mining SSRs from monocotyledonous ESTs. In this study, we have computationally mined and examined the abundance of SSRs in more than 1.54 million ESTs belonging to 55 dicotyledonous species. The frequency of ESTs containing SSRs among species ranged from 2.65% to 16.82%. Dinucleotide repeats were found to be the most abundant followed by tri- or mono-nucleotide repeats. The motifs A/T, AG/GA/CT/TC, and AAG/AGA/GAA/CTT/TTC/TCT were the predominant mono-, di-, and tri-nucleotide SSRs, respectively. Most of the mononucleotide SSRs contained 15-25 repeats, whereas the majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. The comprehensive SSR survey data presented here demonstrates the potential of in silico mining of ESTs for rapid development of SSR markers for genetic analysis and applications in dicotyledonous crops.

Subject(s)

Computational Biology , Cotyledon/genetics , Expressed Sequence Tags , Magnoliopsida/genetics , Minisatellite Repeats , Computational Biology/methods , Databases, Nucleic Acid , Species Specificity

TransMiner: mining transitive associations among biological objects from text.

Narayanasamy, Vijay; Mukhopadhyay, Snehasis; Palakal, Mathew; Potter, David A.

J Biomed Sci ; 11(6): 864-73, 2004.

Article in English | MEDLINE | ID: mdl-15591784

ABSTRACT

Associations among biological objects such as genes, proteins, and drugs can be discovered automatically from the scientific literature. TransMiner is a system for finding associations among objects by mining the Medline database of the scientific literature. The direct associations among the objects are discovered based on the principle of co-occurrence in the form of an association graph. The principle of transitive closure is applied to the association graph to find potential transitive associations. The potential transitive associations that are indeed direct are discovered by iterative retrieval and mining of the Medline documents. Those associations that are not found explicitly in the entire Medline database are transitive associations and are the candidates for hypothesis generation. The transitive associations were ranked based on the sum of weight of terms that co-occur with both the objects. The direct and transitive associations are visualized using a graph visualization applet. TransMiner was tested by finding associations among 56 breast cancer genes and among 24 objects in the calpain signal transduction pathway. TransMiner was also used to rediscover associations between magnesium and migraine.

Subject(s)

Breast Neoplasms/genetics , Computational Biology/methods , Abstracting and Indexing , Algorithms , Databases, Factual , Databases, Genetic , Humans , Information Storage and Retrieval , MEDLINE , Magnesium/metabolism , Migraine Disorders/metabolism , Models, Theoretical , Natural Language Processing , Signal Transduction , Software

An intelligent biological information management system.

Palakal, Mathew; Mukhopadhyay, Snehasis; Mostafa, Javed; Raje, Rajeev; N'Cho, Mathias; Mishra, Santosh.

Bioinformatics ; 18(10): 1283-8, 2002 Oct.

Article in English | MEDLINE | ID: mdl-12376371

ABSTRACT

MOTIVATION: As biomedical researchers are amassing a plethora of information in a variety of forms resulting from the advancements in biomedical research, there is a critical need for innovative information management and knowledge discovery tools to sift through these vast volumes of heterogeneous data and analysis tools. In this paper we present a general model for an information management system that is adaptable and scalable, followed by a detailed design and implementation of one component of the model. The prototype, called BioSifter, was applied to problems in the bioinformatics area. RESULTS: BioSifter was tested using 500 documents obtained from PubMed database on two biological problems related to genetic polymorphism and extracorporal shockwave lithotripsy. The results indicate that BioSifter is a powerful tool for biological researchers to automatically retrieve relevant text documents from biological literature based on their interest profile. The results also indicate that the first stage of information management process, i.e. data to information transformation, significantly reduces the size of the information space. The filtered data obtained through BioSifter is relevant as well as much smaller in dimension compared to all the retrieved data. This would in turn significantly reduce the complexity associated with the next level transformation, i.e. information to knowledge.

Subject(s)

Artificial Intelligence , Database Management Systems , Databases, Bibliographic , Information Storage and Retrieval/methods , Abstracting and Indexing , Algorithms , Databases, Factual , Feasibility Studies , Humans , Internet , Lithotripsy , Pilot Projects , Polymorphism, Genetic , PubMed , User-Computer Interface , Vocabulary, Controlled

A multi-level text mining method to extract biological relationships.

Palakal, Mathew; Stephens, Matthew; Mukhopadhyay, Snehasis; Raje, Rajeev; Rhodes, Simon.

Proc IEEE Comput Soc Bioinform Conf ; 1: 97-108, 2002.

Article in English | MEDLINE | ID: mdl-15838127

ABSTRACT

Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.

Subject(s)

Abstracting and Indexing/methods , Artificial Intelligence , Database Management Systems , Information Storage and Retrieval/methods , Natural Language Processing , Periodicals as Topic , Systems Biology/methods , Algorithms , Gene Expression Profiling/methods , MEDLINE , Vocabulary, Controlled

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL