Search | VHL Search Portal

Show: 20 | 50 | 100

Results 1 - 11 de 11

Filter

State-of-the-art computational methods to predict protein-protein interactions with high accuracy and coverage.

Kewalramani, Neal; Emili, Andrew; Crovella, Mark.

Proteomics ; 23(21-22): e2200292, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37401192

ABSTRACT

Prediction of protein-protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.

Subject(s)

Protein Interaction Mapping , Proteins , Protein Interaction Mapping/methods , Proteins/metabolism , Machine Learning , Amino Acid Sequence , Computational Biology/methods

Dynamic remodeling of Escherichia coli interactome in response to environmental perturbations.

Youssef, Ahmed; Bian, Fei; Paniikov, Nicolai S; Crovella, Mark; Emili, Andrew.

Proteomics ; 23(21-22): e2200404, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37248827

ABSTRACT

Proteins play an essential role in the vital biological processes governing cellular functions. Most proteins function as members of macromolecular machines, with the network of interacting proteins revealing the molecular mechanisms driving the formation of these complexes. Profiling the physiology-driven remodeling of these interactions within different contexts constitutes a crucial component to achieving a comprehensive systems-level understanding of interactome dynamics. Here, we apply co-fractionation mass spectrometry and computational modeling to quantify and profile the interactions of â¼2000 proteins in the bacterium Escherichia coli cultured under 10 distinct culture conditions. The resulting quantitative co-elution patterns revealed large-scale condition-dependent interaction remodeling among protein complexes involved in diverse biochemical pathways in response to the unique environmental challenges. The network-level analysis highlighted interactome-wide biophysical properties and structural patterns governing interaction remodeling. Our results provide evidence of the local and global plasticity of the E. coli interactome along with a rigorous generalizable framework to define protein interaction specificity. We provide an accompanying interactive web application to facilitate the exploration of these rewired networks.

Subject(s)

Escherichia coli , Proteins , Escherichia coli/metabolism , Proteins/metabolism , Software , Mass Spectrometry , Protein Interaction Mapping/methods

Frog embryos use multiple levels of temporal pattern in risk assessment for vibration-cued escape hatching.

Jung, Julie; Guo, Ming; Crovella, Mark E; McDaniel, J Gregory; Warkentin, Karen M.

Anim Cogn ; 25(6): 1527-1544, 2022 Dec.

Article in English | MEDLINE | ID: mdl-35668245

ABSTRACT

Stereotyped signals can be a fast, effective means of communicating danger, but animals assessing predation risk must often use more variable incidental cues. Red eyed-treefrog, Agalychnis callidryas, embryos hatch prematurely to escape from egg predators, cued by vibrations in attacks, but benign rain generates vibrations with overlapping properties. Facing high false-alarm costs, embryos use multiple vibration properties to inform hatching, including temporal pattern elements such as pulse durations and inter-pulse intervals. However, measures of snake and rain vibration as simple pulse-interval patterns are a poor match to embryo behavior. We used vibration playbacks to assess if embryos use a second level of temporal pattern, long gaps within a rhythmic pattern, as indicators of risks. Long vibration-free periods are common during snake attacks but absent from hard rain. Long gaps after a few initial vibrations increase the hatching response to a subsequent vibration series. Moreover, vibration patterns as short as three pulses, separated by long periods of silence, can induce as much hatching as rhythmic pulse series with five times more vibration. Embryos can retain information that increases hatching over at least 45 s of silence. This work highlights that embryo behavior is contextually modulated in complex ways. Identical vibration pulses, pulse groups, and periods of silence can be treated as risk cues in some contexts and not in others. Embryos employ a multi-faceted decision-making process to effectively distinguish between risk cues and benign stimuli.

Subject(s)

Cues , Embryo, Nonmammalian , Animals , Embryo, Nonmammalian/physiology , Anura/physiology , Snakes , Risk Assessment

Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information.

Fan, Jason; Li, Xuan Cindy; Crovella, Mark; Leiserson, Mark D M.

Bioinformatics ; 36(Suppl_2): i866-i874, 2020 12 30.

Article in English | MEDLINE | ID: mdl-33381837

ABSTRACT

MOTIVATION: Mapping genetic interactions (GIs) can reveal important insights into cellular function and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation have largely been developed for and applied in baker's yeast, even as experimental systems have begun to allow measurements in other contexts. Importantly, existing methods face a number of limitations in requiring specific side information and with respect to computational cost. Further, few have addressed how GIs can be imputed when data are scarce. RESULTS: In this article, we address these limitations by presenting a new imputation framework, called Extensible Matrix Factorization (EMF). EMF is a framework of composable models that flexibly exploit cross-species information in the form of GI data across multiple species, and arbitrary side information in the form of kernels (e.g. from protein-protein interaction networks). We perform a rigorous set of experiments on these models in matched GI datasets from baker's and fission yeast. These include the first such experiments on genome-scale GI datasets in multiple species in the same study. We find that EMF models that exploit side and cross-species information improve imputation, especially in data-scarce settings. Further, we show that EMF outperforms the state-of-the-art deep learning method, even when using strictly less data, and incurs orders of magnitude less computational cost. AVAILABILITY: Implementations of models and experiments are available at: https://github.com/lrgr/EMF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Epistasis, Genetic

Functional protein representations from biological networks enable diverse cross-species inference.

Fan, Jason; Cannistra, Anthony; Fried, Inbar; Lim, Tim; Schaffner, Thomas; Crovella, Mark; Hescott, Benjamin; Leiserson, Mark D M.

Nucleic Acids Res ; 47(9): e51, 2019 05 21.

Article in English | MEDLINE | ID: mdl-30847485

ABSTRACT

Transferring knowledge between species is key for many biological applications, but is complicated by divergent and convergent evolution. Many current approaches for this problem leverage sequence and interaction network data to transfer knowledge across species, exemplified by network alignment methods. While these techniques do well, they are limited in scope, creating metrics to address one specific problem or task. We take a different approach by creating an environment where multiple knowledge transfer tasks can be performed using the same protein representations. Specifically, our kernel-based method, MUNK, integrates sequence and network structure to create functional protein representations, embedding proteins from different species in the same vector space. First we show proteins in different species that are close in MUNK-space are functionally similar. Next, we use these representations to share knowledge of synthetic lethal interactions between species. Importantly, we find that the results using MUNK-representations are at least as accurate as existing algorithms for these tasks. Finally, we generalize the notion of a phenolog ('orthologous phenotype') to use functionally similar proteins (i.e. those with similar representations). We demonstrate the utility of this broadened notion by using it to identify known phenologs and novel non-obvious ones supported by current research.

Subject(s)

Computational Biology/methods , Proteins/genetics , Synthetic Lethal Mutations/genetics , Algorithms , Animals , Humans , Models, Animal , Protein Interaction Mapping/methods , Sequence Alignment , Sequence Analysis, Protein/methods , Species Specificity

One for all and all for One: Improving replication of genetic studies through network diffusion.

Lancour, Daniel; Naj, Adam; Mayeux, Richard; Haines, Jonathan L; Pericak-Vance, Margaret A; Schellenberg, Gerard D; Crovella, Mark; Farrer, Lindsay A; Kasif, Simon.

PLoS Genet ; 14(4): e1007306, 2018 04.

Article in English | MEDLINE | ID: mdl-29684019

ABSTRACT

Improving accuracy in genetic studies would greatly accelerate understanding the genetic basis of complex diseases. One approach to achieve such an improvement for risk variants identified by the genome wide association study (GWAS) approach is to incorporate previously known biology when screening variants across the genome. We developed a simple approach for improving the prioritization of candidate disease genes that incorporates a network diffusion of scores from known disease genes using a protein network and a novel integration with GWAS risk scores, and tested this approach on a large Alzheimer disease (AD) GWAS dataset. Using a statistical bootstrap approach, we cross-validated the method and for the first time showed that a network approach improves the expected replication rates in GWAS studies. Several novel AD genes were predicted including CR2, SHARPIN, and PTPN2. Our re-prioritized results are enriched for established known AD-associated biological pathways including inflammation, immune response, and metabolism, whereas standard non-prioritized results were not. Our findings support a strategy of considering network information when investigating genetic risk factors.

Subject(s)

Alzheimer Disease/genetics , Genome-Wide Association Study , Alzheimer Disease/metabolism , Datasets as Topic , Humans , Protein Interaction Maps , Reproducibility of Results , Risk Factors , Support Vector Machine

DESP demixes cell-state profiles from dynamic bulk molecular measurements.

Youssef, Ahmed; Paul, Indranil; Crovella, Mark; Emili, Andrew.

Cell Rep Methods ; 4(3): 100729, 2024 Mar 25.

Article in English | MEDLINE | ID: mdl-38490205

ABSTRACT

Understanding the dynamic expression of proteins and other key molecules driving phenotypic remodeling in development and pathobiology has garnered widespread interest, yet the exploration of these systems at the foundational resolution of the underlying cell states has been significantly limited by technical constraints. Here, we present DESP, an algorithm designed to leverage independent estimates of cell-state proportions, such as from single-cell RNA sequencing, to resolve the relative contributions of cell states to bulk molecular measurements, most notably quantitative proteomics, recorded in parallel. We applied DESP to an in vitro model of the epithelial-to-mesenchymal transition and demonstrated its ability to accurately reconstruct cell-state signatures from bulk-level measurements of both the proteome and transcriptome, providing insights into transient regulatory mechanisms. DESP provides a generalizable computational framework for modeling the relationship between bulk and single-cell molecular measurements, enabling the study of proteomes and other molecular profiles at the cell-state level using established bulk-level workflows.

Subject(s)

Proteomics , Transcriptome , Proteome/genetics , Algorithms , Epithelial-Mesenchymal Transition

Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2.

Law, Jeffrey N; Akers, Kyle; Tasnina, Nure; Santina, Catherine M Della; Deutsch, Shay; Kshirsagar, Meghana; Klein-Seetharaman, Judith; Crovella, Mark; Rajagopalan, Padmavathy; Kasif, Simon; Murali, T M.

Gigascience ; 10(12)2021 12 29.

Article in English | MEDLINE | ID: mdl-34966926

ABSTRACT

BACKGROUND: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.

Subject(s)

COVID-19 , SARS-CoV-2 , Algorithms , Humans , Protein Interaction Maps , Proteins/metabolism

Analysis of brain region-specific co-expression networks reveals clustering of established and novel genes associated with Alzheimer disease.

Lancour, Daniel; Dupuis, Josée; Mayeux, Richard; Haines, Jonathan L; Pericak-Vance, Margaret A; Schellenberg, Gerard C; Crovella, Mark; Farrer, Lindsay A; Kasif, Simon.

Alzheimers Res Ther ; 12(1): 103, 2020 09 02.

Article in English | MEDLINE | ID: mdl-32878640

ABSTRACT

BACKGROUND: Identifying and understanding the functional role of genetic risk factors for Alzheimer disease (AD) has been complicated by the variability of genetic influences across brain regions and confounding with age-related neurodegeneration. METHODS: A gene co-expression network was constructed using data obtained from the Allen Brain Atlas for multiple brain regions (cerebral cortex, cerebellum, and brain stem) in six individuals. Gene network analyses were seeded with 52 reproducible (i.e., established) AD (RAD) genes. Genome-wide association study summary data were integrated with the gene co-expression results and phenotypic information (i.e., memory and aging-related outcomes) from gene knockout studies in Drosophila to generate rankings for other genes that may have a role in AD. RESULTS: We found that co-expression of the RAD genes is strongest in the cortical regions where neurodegeneration due to AD is most severe. There was significant evidence for two novel AD-related genes including EPS8 (FDR p = 8.77 × 10-3) and HSPA2 (FDR p = 0.245). CONCLUSIONS: Our findings indicate that AD-related risk factors are potentially associated with brain region-specific effects on gene expression that can be detected using a gene network approach.

Subject(s)

Alzheimer Disease , Adaptor Proteins, Signal Transducing , Alzheimer Disease/genetics , Brain/diagnostic imaging , Cluster Analysis , Gene Expression Profiling , Genome-Wide Association Study , Humans

10.

Single-cell transcriptional networks in differentiating preadipocytes suggest drivers associated with tissue heterogeneity.

Ramirez, Alfred K; Dankel, Simon N; Rastegarpanah, Bashir; Cai, Weikang; Xue, Ruidan; Crovella, Mark; Tseng, Yu-Hua; Kahn, C Ronald; Kasif, Simon.

Nat Commun ; 11(1): 2117, 2020 04 30.

Article in English | MEDLINE | ID: mdl-32355218

ABSTRACT

White adipose tissue plays an important role in physiological homeostasis and metabolic disease. Different fat depots have distinct metabolic and inflammatory profiles and are differentially associated with disease risk. It is unclear whether these differences are intrinsic to the pre-differentiated stage. Using single-cell RNA sequencing, a unique network methodology and a data integration technique, we predict metabolic phenotypes in differentiating cells. Single-cell RNA-seq profiles of human preadipocytes during adipogenesis in vitro identifies at least two distinct classes of subcutaneous white adipocytes. These differences in gene expression are separate from the process of browning and beiging. Using a systems biology approach, we identify a new network of zinc-finger proteins that are expressed in one class of preadipocytes and is potentially involved in regulating adipogenesis. Our findings gain a deeper understanding of both the heterogeneity of white adipocytes and their link to normal metabolism and disease.

Subject(s)

Adipocytes, White/cytology , Adipogenesis , Cell Differentiation/genetics , Single-Cell Analysis , Transcription, Genetic , Cells, Cultured , Cluster Analysis , Gene Expression Profiling , Gene Regulatory Networks , Glucose/metabolism , Humans , Oxygen Consumption , Phenotype , Polymerase Chain Reaction , Protein Interaction Mapping , Sequence Analysis, RNA , Systems Biology

11.

Going the distance for protein function prediction: a new distance metric for protein interaction networks.

Cao, Mengfei; Zhang, Hao; Park, Jisoo; Daniels, Noah M; Crovella, Mark E; Cowen, Lenore J; Hescott, Benjamin.

PLoS One ; 8(10): e76339, 2013.

Article in English | MEDLINE | ID: mdl-24194834

ABSTRACT

In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.

Subject(s)

Algorithms , Models, Genetic , Protein Interaction Maps/genetics , Proteins/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL