Search | Nursing VHL Search Portal

1.

NIBBS-search for fast and accurate prediction of phenotype-biased metabolic systems.

Schmidt, Matthew C; Rocha, Andrea M; Padmanabhan, Kanchana; Shpanskaya, Yekaterina; Banfield, Jill; Scott, Kathleen; Mihelcic, James R; Samatova, Nagiza F.

PLoS Comput Biol ; 8(5): e1002490, 2012.

Article in English | MEDLINE | ID: mdl-22589706

ABSTRACT

Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organism's genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS.

Subject(s)

Data Mining/methods , Databases, Protein , Metabolome/physiology , Models, Biological , Protein Interaction Mapping/methods , Proteome/metabolism , Signal Transduction/physiology , Algorithms , Animals , Computer Simulation , Humans , Periodicals as Topic , Phenotype

2.

In-silico identification of phenotype-biased functional modules.

Padmanabhan, Kanchana; Wilson, Kevin; Rocha, Andrea M; Wang, Kuangyu; Mihelcic, James R; Samatova, Nagiza F.

Proteome Sci ; 10 Suppl 1: S2, 2012 Jun 21.

Article in English | MEDLINE | ID: mdl-22759578

ABSTRACT

BACKGROUND: Phenotypes exhibited by microorganisms can be useful for several purposes, e.g., ethanol as an alternate fuel. Sometimes, the target phenotype maybe required in combination with other phenotypes, in order to be useful, for e.g., an industrial process may require that the organism survive in an anaerobic, alcohol rich environment and be able to feed on both hexose and pentose sugars to produce ethanol. This combination of traits may not be available in any existing organism or if they do exist, the mechanisms involved in the phenotype-expression may not be efficient enough to be useful. Thus, it may be required to genetically modify microorganisms. However, before any genetic modification can take place, it is important to identify the underlying cellular subsystems responsible for the expression of the target phenotype. RESULTS: In this paper, we develop a method to identify statistically significant and phenotypically-biased functional modules. The method can compare the organismal network information from hundreds of phenotype expressing and phenotype non-expressing organisms to identify cellular subsystems that are more prone to occur in phenotype-expressing organisms than in phenotype non-expressing organisms. We have provided literature evidence that the phenotype-biased modules identified for phenotypes such as hydrogen production (dark and light fermentation), respiration, gram-positive, gram-negative and motility, are indeed phenotype-related. CONCLUSION: Thus we have proposed a methodology to identify phenotype-biased cellular subsystems. We have shown the effectiveness of our methodology by applying it to several target phenotypes. The code and all supplemental files can be downloaded from (http://freescience.org/cs/phenotype-biased-biclusters/).

3.

Efficient α, ß-motif finder for identification of phenotype-related functional modules.

Schmidt, Matthew C; Rocha, Andrea M; Padmanabhan, Kanchana; Chen, Zhengzhang; Scott, Kathleen; Mihelcic, James R; Samatova, Nagiza F.

BMC Bioinformatics ; 12: 440, 2011 Nov 11.

Article in English | MEDLINE | ID: mdl-22078292

ABSTRACT

BACKGROUND: Microbial communities in their natural environments exhibit phenotypes that can directly cause particular diseases, convert biomass or wastewater to energy, or degrade various environmental contaminants. Understanding how these communities realize specific phenotypic traits (e.g., carbon fixation, hydrogen production) is critical for addressing health, bioremediation, or bioenergy problems. RESULTS: In this paper, we describe a graph-theoretical method for in silico prediction of the cellular subsystems that are related to the expression of a target phenotype. The proposed (α, ß)-motif finder approach allows for identification of these phenotype-related subsystems that, in addition to metabolic subsystems, could include their regulators, sensors, transporters, and even uncharacterized proteins. By comparing dozens of genome-scale networks of functionally associated proteins, our method efficiently identifies those statistically significant functional modules that are in at least α networks of phenotype-expressing organisms but appear in no more than ß networks of organisms that do not exhibit the target phenotype. It has been shown via various experiments that the enumerated modules are indeed related to phenotype-expression when tested with different target phenotypes like hydrogen production, motility, aerobic respiration, and acid-tolerance. CONCLUSION: Thus, we have proposed a methodology that can identify potential statistically significant phenotype-related functional modules. The functional module is modeled as an (α, ß)-clique, where α and ß are two criteria introduced in this work. We also propose a novel network model, called the two-typed, divided network. The new network model and the criteria make the problem tractable even while very large networks are being compared. The code can be downloaded from http://www.freescience.org/cs/ABClique/

Subject(s)

Acids/metabolism , Algorithms , Bacteria/genetics , Bacteria/metabolism , Computing Methodologies , Citric Acid Cycle , Hydrogen/metabolism , Phenotype , Proteobacteria

4.

A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry.

Pan, Chongle; Park, Byung H; McDonald, William H; Carey, Patricia A; Banfield, Jillian F; VerBerkmoes, Nathan C; Hettich, Robert L; Samatova, Nagiza F.

BMC Bioinformatics ; 11: 118, 2010 Mar 05.

Article in English | MEDLINE | ID: mdl-20205730

ABSTRACT

BACKGROUND: High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. RESULTS: In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. CONCLUSIONS: Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

Subject(s)

Algorithms , Proteomics/methods , Tandem Mass Spectrometry/methods , Amino Acid Sequence , Databases, Protein , Peptides/chemistry , Protein Processing, Post-Translational , Sequence Analysis, Protein

5.

Characterization of anaerobic catabolism of p-coumarate in Rhodopseudomonas palustris by integrating transcriptomics and quantitative proteomics.

Pan, Chongle; Oda, Yasuhiro; Lankford, Patricia K; Zhang, Bing; Samatova, Nagiza F; Pelletier, Dale A; Harwood, Caroline S; Hettich, Robert L.

Mol Cell Proteomics ; 7(5): 938-48, 2008 May.

Article in English | MEDLINE | ID: mdl-18156135

ABSTRACT

In this study, the pathway for anaerobic catabolism of p-coumarate by a model bacterium, Rhodopseudomonas palustris, was characterized by comparing the gene expression profiles of cultures grown in the presence of p-coumarate, benzoate, or succinate as the sole carbon sources. Gene expression was quantified at the mRNA level with transcriptomics and at the protein level with quantitative proteomics using (15)N metabolic labeling. Protein relative abundances, along with their confidence intervals for statistical significance evaluation, were estimated with the software ProRata. Both -omics measurements were used as the transcriptomics provided near-full genome coverage of gene expression profiles and the quantitative proteomics ascertained abundance changes of over 1600 proteins. The integrated gene expression data are consistent with the hypothesis that p-coumarate is converted to benzoyl-CoA, which is then degraded via a known aromatic ring reduction pathway. For the metabolism of p-coumarate to benzoyl-CoA, two alternative routes, a beta-oxidation route and a non-beta-oxidation route, are possible. The integrated gene expression data provided strong support for the non-beta-oxidation route in R. palustris. A putative gene was proposed for every step in the non-beta-oxidation route.

Subject(s)

Bacterial Proteins/metabolism , Coumaric Acids/metabolism , Gene Expression Profiling , Proteomics , Rhodopseudomonas/growth & development , Rhodopseudomonas/metabolism , Anaerobiosis/genetics , Bacterial Proteins/analysis , Bacterial Proteins/genetics , Benzoates/metabolism , Protein Biosynthesis/genetics , RNA, Messenger/analysis , RNA, Messenger/metabolism , Rhodopseudomonas/genetics , Succinic Acid/metabolism

6.

From pull-down data to protein interaction networks and complexes with biological relevance.

Zhang, Bing; Park, Byung-Hoon; Karpinets, Tatiana; Samatova, Nagiza F.

Bioinformatics ; 24(7): 979-86, 2008 Apr 01.

Article in English | MEDLINE | ID: mdl-18304937

ABSTRACT

MOTIVATION: Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance. RESULTS: A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F(1)-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.

Subject(s)

Databases, Protein , Gene Expression Profiling/methods , Models, Biological , Protein Interaction Mapping/methods , Proteins/chemistry , Proteins/metabolism , Signal Transduction/physiology , Algorithms , Biology/methods , Computer Simulation , Information Storage and Retrieval/methods , Peptide Mapping/methods , Structure-Activity Relationship , Systems Integration

7.

BioDEAL: community generation of biological annotations.

Breimyer, Paul; Green, Nathan; Kumar, Vinay; Samatova, Nagiza F.

BMC Med Inform Decis Mak ; 9 Suppl 1: S5, 2009 Nov 03.

Article in English | MEDLINE | ID: mdl-19891799

ABSTRACT

BACKGROUND: Publication databases in biomedicine (e.g., PubMed, MEDLINE) are growing rapidly in size every year, as are public databases of experimental biological data and annotations derived from the data. Publications often contain evidence that confirm or disprove annotations, such as putative protein functions, however, it is increasingly difficult for biologists to identify and process published evidence due to the volume of papers and the lack of a systematic approach to associate published evidence with experimental data and annotations. Natural Language Processing (NLP) tools can help address the growing divide by providing automatic high-throughput detection of simple terms in publication text. However, NLP tools are not mature enough to identify complex terms, relationships, or events. RESULTS: In this paper we present and extend BioDEAL, a community evidence annotation system that introduces a feedback loop into the database-publication cycle to allow scientists to connect data-driven biological concepts to publications. CONCLUSION: BioDEAL may change the way biologists relate published evidence with experimental data. Instead of biologists or research groups searching and managing evidence independently, the community can collectively build and share this knowledge.

Subject(s)

Databases as Topic/organization & administration , Information Storage and Retrieval/methods , Natural Language Processing , Periodicals as Topic , Database Management Systems , Internet , Social Support

8.

Sex Differences in Cognitive Decline in Subjects with High Likelihood of Mild Cognitive Impairment due to Alzheimer's disease.

Sohn, Dongwha; Shpanskaya, Katie; Lucas, Joseph E; Petrella, Jeffrey R; Saykin, Andrew J; Tanzi, Rudolph E; Samatova, Nagiza F; Doraiswamy, P Murali.

Sci Rep ; 8(1): 7490, 2018 05 10.

Article in English | MEDLINE | ID: mdl-29748598

ABSTRACT

Sex differences in Alzheimer's disease (AD) biology and progression are not yet fully characterized. The goal of this study is to examine the effect of sex on cognitive progression in subjects with high likelihood of mild cognitive impairment (MCI) due to Alzheimer's and followed up to 10 years in the Alzheimer's Disease Neuroimaging Initiative (ADNI). Cerebrospinal fluid total-tau and amyloid-beta (Aß42) ratio values were used to sub-classify 559 MCI subjects (216 females, 343 males) as having "high" or "low" likelihood for MCI due to Alzheimer's. Data were analyzed using mixed-effects models incorporating all follow-ups. The worsening from baseline in Alzheimer's Disease Assessment Scale-Cognitive score (mean, SD) (9 ± 12) in subjects with high likelihood of MCI due to Alzheimer's was markedly greater than that in subjects with low likelihood (1 ± 6, p < 0.0001). Among MCI due to AD subjects, the mean worsening in cognitive score was significantly greater in females (11.58 ± 14) than in males (6.87 ± 11, p = 0.006). Our findings highlight the need to further investigate these findings in other populations and develop sex specific timelines for Alzheimer's disease progression.

Subject(s)

Alzheimer Disease/epidemiology , Alzheimer Disease/etiology , Cognition/physiology , Cognitive Dysfunction/epidemiology , Cognitive Dysfunction/etiology , Sex Characteristics , Aged , Aged, 80 and over , Alzheimer Disease/diagnosis , Alzheimer Disease/pathology , Cognitive Dysfunction/diagnosis , Disease Progression , Female , Humans , Longitudinal Studies , Male , Neuroimaging , Neuropsychological Tests , Prevalence , Retrospective Studies , Risk Factors

9.

The Application of the Weighted k-Partite Graph Problem to the Multiple Alignment for Metabolic Pathways.

Chen, Wenbin; Hendrix, William; Samatova, Nagiza F.

J Comput Biol ; 24(12): 1195-1211, 2017 Dec.

Article in English | MEDLINE | ID: mdl-28891687

ABSTRACT

The problem of aligning multiple metabolic pathways is one of very challenging problems in computational biology. A metabolic pathway consists of three types of entities: reactions, compounds, and enzymes. Based on similarities between enzymes, Tohsato et al. gave an algorithm for aligning multiple metabolic pathways. However, the algorithm given by Tohsato et al. neglects the similarities among reactions, compounds, enzymes, and pathway topology. How to design algorithms for the alignment problem of multiple metabolic pathways based on the similarity of reactions, compounds, and enzymes? It is a difficult computational problem. In this article, we propose an algorithm for the problem of aligning multiple metabolic pathways based on the similarities among reactions, compounds, enzymes, and pathway topology. First, we compute a weight between each pair of like entities in different input pathways based on the entities' similarity score and topological structure using Ay et al.'s methods. We then construct a weighted k-partite graph for the reactions, compounds, and enzymes. We extract a mapping between these entities by solving the maximum-weighted k-partite matching problem by applying a novel heuristic algorithm. By analyzing the alignment results of multiple pathways in different organisms, we show that the alignments found by our algorithm correctly identify common subnetworks among multiple pathways.

Subject(s)

Algorithms , Computational Biology/methods , Metabolic Networks and Pathways , Humans , Protein Interaction Mapping , Sequence Alignment

10.

In silico discovery of enzyme-substrate specificity-determining residue clusters.

Yu, Gong-Xin; Park, Byung-Hoon; Chandramohan, Praveen; Munavalli, Rajesh; Geist, Al; Samatova, Nagiza F.

J Mol Biol ; 352(5): 1105-17, 2005 Oct 07.

Article in English | MEDLINE | ID: mdl-16140329

ABSTRACT

The binding between an enzyme and its substrate is highly specific, despite the fact that many different enzymes show significant sequence and structure similarity. There must be, then, substrate specificity-determining residues that enable different enzymes to recognize their unique substrates. We reason that a coordinated, not independent, action of both conserved and non-conserved residues determine enzymatic activity and specificity. Here, we present a surface patch ranking (SPR) method for in silico discovery of substrate specificity-determining residue clusters by exploring both sequence conservation and correlated mutations. As case studies we apply SPR to several highly homologous enzymatic protein pairs, such as guanylyl versus adenylyl cyclases, lactate versus malate dehydrogenases, and trypsin versus chymotrypsin. Without using experimental data, we predict several single and multi-residue clusters that are consistent with previous mutagenesis experimental results. Most single-residue clusters are directly involved in enzyme-substrate interactions, whereas multi-residue clusters are vital for domain-domain and regulator-enzyme interactions, indicating their complementary role in specificity determination. These results demonstrate that SPR may help the selection of target residues for mutagenesis experiments and, thus, focus rational drug design, protein engineering, and functional annotation to the relevant regions of a protein.

Subject(s)

Amino Acids/chemistry , Amino Acids/physiology , Computational Biology , Enzymes/chemistry , Enzymes/physiology , Adenylyl Cyclases/physiology , Amino Acid Sequence , Animals , Binding Sites/physiology , Cattle , Chymotrypsin/physiology , Crystallography, X-Ray , Enzymes/genetics , Guanylate Cyclase/physiology , L-Lactate Dehydrogenase/physiology , Malate Dehydrogenase/physiology , Molecular Sequence Data , Protein Structure, Tertiary , Substrate Specificity/physiology , Trypsin/chemistry , Trypsin/physiology

11.

The sorting direct method for stochastic simulation of biochemical systems with varying reaction execution behavior.

McCollum, James M; Peterson, Gregory D; Cox, Chris D; Simpson, Michael L; Samatova, Nagiza F.

Comput Biol Chem ; 30(1): 39-49, 2006 Feb.

Article in English | MEDLINE | ID: mdl-16321569

ABSTRACT

A key to advancing the understanding of molecular biology in the post-genomic age is the development of accurate predictive models for genetic regulation, protein interaction, metabolism, and other biochemical processes. To facilitate model development, simulation algorithms must provide an accurate representation of the system, while performing the simulation in a reasonable amount of time. Gillespie's stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous models with small populations of chemical species and properly represents noise, but it is often abandoned when modeling larger systems because of its computational complexity. In this work, we examine the performance of different versions of the SSA when applied to several biochemical models. Through our analysis, we discover that transient changes in reaction execution frequencies, which are typical of biochemical models with gene induction and repression, can dramatically affect simulator performance. To account for these shifts, we propose a new algorithm called the sorting direct method that maintains a loosely sorted order of the reactions as the simulation executes. Our measurements show that the sorting direct method performs favorably when compared to other well-known exact stochastic simulation algorithms.

Subject(s)

Models, Chemical , Stochastic Processes , Systems Biology/methods , Algorithms , Aliivibrio fischeri/chemistry , Escherichia coli/chemistry

12.

An evolution-based analysis scheme to identify CO2/O2 specificity-determining factors for ribulose 1,5-bisphosphate carboxylase/oxygenase.

Yu, Gong-Xin; Park, Byung-Hoon; Chandramohan, Praveen; Geist, Al; Samatova, Nagiza F.

Protein Eng Des Sel ; 18(12): 589-96, 2005 Dec.

Article in English | MEDLINE | ID: mdl-16246824

ABSTRACT

Ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCo) catalyzes a rate-limiting step in photosynthetic carbon assimilation (reacting with CO2) and its competitive photo-respiratory carbon oxidation (reacting with O2). RuBisCo enzyme with an enhanced CO2/O2 specificity would boost the ability to make great progress in agricultural production and environmental management. RuBisCos in marine non-green algae, resulting from an earlier endo-symbiotic event, diverge greatly from those in green plants and cyanobacteria and, further, have the highest CO2/O2 specificity whereas RuBisCos in cyanobacteria have the lowest. We assumed that there exist different levels of CO2/O2 specificity-determining factors, corresponding to different evolutionary events and specificity levels. Based on this assumption, we devised a scheme to identify these substrate-determining factors. From this analysis, we are able to discover different categories of the CO2/O2 specificity-determining factors that show which residue substitutions account for (relatively) small specificity changes, as happened in green plants, or a tremendous enhancement, as observed in marine non-green algae. Therefore, the analysis can improve our understanding of molecular mechanisms in the substrate specificity development and prioritize candidate specificity-determining surface residues for site-directed mutagenesis.

Subject(s)

Carbon Dioxide/metabolism , Oxygen/metabolism , Ribulose-Bisphosphate Carboxylase/genetics , Amino Acid Sequence , Computational Biology , Cyanobacteria/enzymology , Databases, Protein , Eukaryota/enzymology , Evolution, Molecular , Models, Molecular , Molecular Sequence Data , Mutation , Plants/enzymology , Ribulose-Bisphosphate Carboxylase/metabolism , Sequence Homology, Amino Acid , Substrate Specificity

13.

On FastMap and the convex hull of multivariate data: toward fast and robust dimension reduction.

Ostrouchov, George; Samatova, Nagiza F.

IEEE Trans Pattern Anal Mach Intell ; 27(8): 1340-3, 2005 Aug.

Article in English | MEDLINE | ID: mdl-16119272

ABSTRACT

FastMap is a dimension reduction technique that operates on distances between objects. Although only distances are used, implicitly the technique assumes that the objects are points in a p-dimensional Euclidean space. It selects a sequence of k < or = p orthogonal axes defined by distant pairs of points (called pivots) and computes the projection of the points onto the orthogonal axes. We show that FastMap uses only the outer envelope of a data set. Pivots are taken from the faces, usually vertices, of the convex hull of the data points in the original implicit Euclidean space. This provides a bridge to results in robust statistics, where the convex hull is used as a tool in multivariate outlier detection and in robust estimation methods. The connection sheds new light on the properties of FastMap, particularly its sensitivity to outliers, and provides an opportunity for a new class of dimension reduction algorithms, RobustMaps, that retain the speed of FastMap and exploit ideas in robust statistics.

Subject(s)

Algorithms , Artificial Intelligence , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Pattern Recognition, Automated/methods , Cluster Analysis , Computer Simulation , Models, Statistical , Multivariate Analysis , Signal Processing, Computer-Assisted , Subtraction Technique

14.

An efficient algorithm for pairwise local alignment of protein interaction networks.

Chen, Wenbin; Schmidt, Matthew; Tian, Wenhong; Samatova, Nagiza F; Zhang, Shaohong.

J Bioinform Comput Biol ; 13(2): 1550003, 2015 Apr.

Article in English | MEDLINE | ID: mdl-25477149

ABSTRACT

Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression. In this paper, we formulate the problem of identifying these conserved patterns as a graph optimization problem, and develop a fast heuristic algorithm for this problem. We compare the performance of our network alignment algorithm to that of the MaWISh algorithm [Koyutürk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A, Pairwise alignment of protein interaction networks, J Comput Biol13(2):182-199, 2006.], which bases its search algorithm on a related decision problem formulation. We find that our algorithm discovers conserved modules with a larger number of proteins in an order of magnitude less time. The protein sets found by our algorithm correspond to known conserved functional modules at comparable precision and recall rates as those produced by the MaWISh algorithm.

Subject(s)

Algorithms , Protein Interaction Maps , Sequence Alignment/statistics & numerical data , Animals , Computational Biology , Conserved Sequence , Gene Ontology/statistics & numerical data , Humans , Protein Interaction Mapping/statistics & numerical data

15.

Carbon sequestration in Synechococcus Sp.: from molecular machines to hierarchical modeling.

Heffelfinger, Grant S; Martino, Anthony; Gorin, Andrey; Xu, Ying; Rintoul, Mark D; Geist, Al; Al-Hashimi, Hashim M; Davidson, George S; Faulon, Jean Loup; Frink, Laurie J; Haaland, David M; Hart, William E; Jakobsson, Erik; Lane, Todd; Li, Ming; Locascio, Phil; Olken, Frank; Olman, Victor; Palenik, Brian; Plimpton, Steven J; Roe, Diana C; Samatova, Nagiza F; Shah, Manesh; Shoshoni, Arie; Strauss, Charlie E M; Thomas, Edward V; Timlin, Jerilyn A; Xu, Dong.

OMICS ; 6(4): 305-30, 2002.

Article in English | MEDLINE | ID: mdl-12626091

ABSTRACT

The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to "achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life." While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling." This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to elucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO(2) are important terms in the global environmental response to anthropogenic atmospheric inputs of CO(2) and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.

Subject(s)

Carbon/metabolism , Cyanobacteria/physiology , Genome , Algorithms , Carbon/physiology , Cyanobacteria/metabolism , Mass Spectrometry , Models, Biological , Models, Statistical , Research/trends , Software

16.

Global alignment of pairwise protein interaction networks for maximal common conserved patterns.

Tian, Wenhong; Samatova, Nagiza F.

Int J Genomics ; 2013: 670623, 2013.

Article in English | MEDLINE | ID: mdl-23710435

ABSTRACT

A number of tools for the alignment of protein-protein interaction (PPI) networks have laid the foundation for PPI network analysis. Most of alignment tools focus on finding conserved interaction regions across the PPI networks through either local or global mapping of similar sequences. Researchers are still trying to improve the speed, scalability, and accuracy of network alignment. In view of this, we introduce a connected-components based fast algorithm, HopeMap, for network alignment. Observing that the size of true orthologs across species is small comparing to the total number of proteins in all species, we take a different approach based on a precompiled list of homologs identified by KO terms. Applying this approach to S. cerevisiae (yeast) and D. melanogaster (fly), E. coli K12 and S. typhimurium, E. coli K12 and C. crescenttus, we analyze all clusters identified in the alignment. The results are evaluated through up-to-date known gene annotations, gene ontology (GO), and KEGG ortholog groups (KO). Comparing to existing tools, our approach is fast with linear computational cost, highly accurate in terms of KO and GO terms specificity and sensitivity, and can be extended to multiple alignments easily.

17.

Complex biomarker discovery in neuroimaging data: Finding a needle in a haystack.

Atluri, Gowtham; Padmanabhan, Kanchana; Fang, Gang; Steinbach, Michael; Petrella, Jeffrey R; Lim, Kelvin; Macdonald, Angus; Samatova, Nagiza F; Doraiswamy, P Murali; Kumar, Vipin.

Neuroimage Clin ; 3: 123-31, 2013 Aug 07.

Article in English | MEDLINE | ID: mdl-24179856

ABSTRACT

Neuropsychiatric disorders such as schizophrenia, bipolar disorder and Alzheimer's disease are major public health problems. However, despite decades of research, we currently have no validated prognostic or diagnostic tests that can be applied at an individual patient level. Many neuropsychiatric diseases are due to a combination of alterations that occur in a human brain rather than the result of localized lesions. While there is hope that newer imaging technologies such as functional and anatomic connectivity MRI or molecular imaging may offer breakthroughs, the single biomarkers that are discovered using these datasets are limited by their inability to capture the heterogeneity and complexity of most multifactorial brain disorders. Recently, complex biomarkers have been explored to address this limitation using neuroimaging data. In this manuscript we consider the nature of complex biomarkers being investigated in the recent literature and present techniques to find such biomarkers that have been developed in related areas of data mining, statistics, machine learning and bioinformatics.

18.

Functional annotation of hierarchical modularity.

Padmanabhan, Kanchana; Wang, Kuangyu; Samatova, Nagiza F.

PLoS One ; 7(4): e33744, 2012.

Article in English | MEDLINE | ID: mdl-22496762

ABSTRACT

In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology) and the association of individual genes or proteins with these concepts (e.g., GO terms), our method will assign a Hierarchical Modularity Score (HMS) to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our method using Saccharomyces cerevisiae data from KEGG and MIPS databases and several other computationally derived and curated datasets. The code and additional supplemental files can be obtained from http://code.google.com/p/functional-annotation-of-hierarchical-modularity/ (Accessed 2012 March 13).

Subject(s)

Algorithms , Computational Biology/methods , Metabolic Networks and Pathways , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Cluster Analysis , Databases, Factual , Protein Interaction Mapping , Saccharomyces cerevisiae Proteins/genetics

19.

Toward Personalized Network Biomarkers in Alzheimer's Disease: Computing Individualized Genomic and Protein Crosstalk Maps.

Padmanabhan, Kanchana; Shpanskaya, Katie; Bello, Gonzalo; Doraiswamy, P Murali; Samatova, Nagiza F.

Front Aging Neurosci ; 9: 315, 2017.

Article in English | MEDLINE | ID: mdl-29085293

20.

SPICE: discovery of phenotype-determining component interplays.

Chen, Zhengzhang; Padmanabhan, Kanchana; Rocha, Andrea M; Shpanskaya, Yekaterina; Mihelcic, James R; Scott, Kathleen; Samatova, Nagiza F.

BMC Syst Biol ; 6: 40, 2012 May 14.

Article in English | MEDLINE | ID: mdl-22583800

ABSTRACT

BACKGROUND: A latent behavior of a biological cell is complex. Deriving the underlying simplicity, or the fundamental rules governing this behavior has been the Holy Grail of systems biology. Data-driven prediction of the system components and their component interplays that are responsible for the target system's phenotype is a key and challenging step in this endeavor. RESULTS: The proposed approach, which we call System Phenotype-related Interplaying Components Enumerator (SPICE), iteratively enumerates statistically significant system components that are hypothesized (1) to play an important role in defining the specificity of the target system's phenotype(s); (2) to exhibit a functionally coherent behavior, namely, act in a coordinated manner to perform the phenotype-specific function; and (3) to improve the predictive skill of the system's phenotype(s) when used collectively in the ensemble of predictive models. SPICE can be applied to both instance-based data and network-based data. When validated, SPICE effectively identified system components related to three target phenotypes: biohydrogen production, motility, and cancer. Manual results curation agreed with the known phenotype-related system components reported in literature. Additionally, using the identified system components as discriminatory features improved the prediction accuracy by 10% on the phenotype-classification task when compared to a number of state-of-the-art methods applied to eight benchmark microarray data sets. CONCLUSION: We formulate a problem--enumeration of phenotype-determining system component interplays--and propose an effective methodology (SPICE) to address this problem. SPICE improved identification of cancer-related groups of genes from various microarray data sets and detected groups of genes associated with microbial biohydrogen production and motility, many of which were reported in literature. SPICE also improved the predictive skill of the system's phenotype determination compared to individual classifiers and/or other ensemble methods, such as bagging, boosting, random forest, nearest shrunken centroid, and random forest variable selection method.

Subject(s)

Phenotype , Systems Biology/methods , Algorithms , Gene Regulatory Networks , Hydrogen/metabolism , Hydrogenase/metabolism , Nitrogenase/metabolism

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL