Search | VHL Regional Portal

The blobulator: a webtool for identification and visual exploration of hydrophobic modularity in protein sequences.

Pitman, Connor; Santiago-McRae, Ezry; Lohia, Ruchi; Bassi, Kaitlin; Joseph, Thomas T; Hansen, Matthew E B; Brannigan, Grace.

bioRxiv ; 2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38293114

ABSTRACT

Motivation: Clusters of hydrophobic residues are known to promote structured protein stability and drive protein aggregation. Recent work has shown that identifying contiguous hydrophobic residue clusters (termed "blobs") has proven useful in both intrinsically disordered protein (IDP) simulation and human genome studies. However, a graphical interface was unavailable. Results: Here, we present the blobulator: an interactive and intuitive web interface to detect intrinsic modularity in any protein sequence based on hydrophobicity. We demonstrate three use cases of the blobulator and show how identifying blobs with biologically relevant parameters provides useful information about a globular protein, two orthologous membrane proteins, and an IDP. Other potential applications are discussed, including: predicting protein segments with critical roles in tertiary interactions, providing a definition of local order and disorder with clear edges, and aiding in predicting protein features from sequence. Availability: The blobulator GUI can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.

A global high-density chromatin interaction network reveals functional long-range and trans-chromosomal relationships.

Lohia, Ruchi; Fox, Nathan; Gillis, Jesse.

Genome Biol ; 23(1): 238, 2022 11 09.

Article in English | MEDLINE | ID: mdl-36352464

ABSTRACT

BACKGROUND: Chromatin contacts are essential for gene-expression regulation; however, obtaining a high-resolution genome-wide chromatin contact map is still prohibitively expensive owing to large genome sizes and the quadratic scale of pairwise data. Chromosome conformation capture (3C)-based methods such as Hi-C have been extensively used to obtain chromatin contacts. However, since the sparsity of these maps increases with an increase in genomic distance between contacts, long-range or trans-chromatin contacts are especially challenging to sample. RESULTS: Here, we create a high-density reference genome-wide chromatin contact map using a meta-analytic approach. We integrate 3600 human, 6700 mouse, and 500 fly Hi-C experiments to create species-specific meta-Hi-C chromatin contact maps with 304 billion, 193 billion, and 19 billion contacts in respective species. We validate that meta-Hi-C contact maps are uniquely powered to capture functional chromatin contacts in both cis and trans. We find that while individual dataset Hi-C networks are largely unable to predict any long-range coexpression (median 0.54 AUC), meta-Hi-C networks perform comparably in both cis and trans (0.65 AUC vs 0.64 AUC). Similarly, for long-range expression quantitative trait loci (eQTL), meta-Hi-C contacts outperform all individual Hi-C experiments, providing an improvement over the conventionally used linear genomic distance-based association. Assessing between species, we find patterns of chromatin contact conservation in both cis and trans and strong associations with coexpression even in species for which Hi-C data is lacking. CONCLUSIONS: We have generated an integrated chromatin interaction network which complements a large number of methodological and analytic approaches focused on improved specificity or interpretation. This high-depth "super-experiment" is surprisingly powerful in capturing long-range functional relationships of chromatin interactions, which are now able to predict coexpression, eQTLs, and cross-species relationships. The meta-Hi-C networks are available at https://labshare.cshl.edu/shares/gillislab/resource/HiC/ .

Subject(s)

Chromatin , Chromosomes , Humans , Mice , Animals , Chromatin/genetics , Chromosomes/genetics , Genomics , Chromosome Mapping , Quantitative Trait Loci

Characterization of intrinsically disordered regions in proteins informed by human genetic diversity.

Ahmed, Shehab S; Rifat, Zaara T; Lohia, Ruchi; Campbell, Arthur J; Dunker, A Keith; Rahman, M Sohel; Iqbal, Sumaiya.

PLoS Comput Biol ; 18(3): e1009911, 2022 03.

Article in English | MEDLINE | ID: mdl-35275927

ABSTRACT

All proteomes contain both proteins and polypeptide segments that don't form a defined three-dimensional structure yet are biologically active-called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase ("UniProt features": active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.

Subject(s)

Intrinsically Disordered Proteins , Amino Acid Sequence , Genetic Variation/genetics , Humans , Intrinsically Disordered Proteins/chemistry , Protein Conformation , Proteome/genetics

Contiguously hydrophobic sequences are functionally significant throughout the human exome.

Lohia, Ruchi; Hansen, Matthew E B; Brannigan, Grace.

Proc Natl Acad Sci U S A ; 119(12): e2116267119, 2022 03 22.

Article in English | MEDLINE | ID: mdl-35294280

ABSTRACT

Hydrophobic interactions have long been established as essential for stabilizing struc-tured proteins as well as drivers of aggregation, but the impact of hydrophobicity on thefunctional significance of sequence variants has rarely been considered in a genome-wide context. Here we test the role of hydrophobicity on functional impact across70,000 disease- and nondisease-associated single-nucleotide polymorphisms (SNPs),using enrichment of disease association as an indicator of functionality. We find thatfunctional impact is uncorrelated with hydrophobicity of the SNP itself and only weaklycorrelated with the average local hydrophobicity, but is strongly correlated with boththe size and minimum hydrophobicity of the contiguously hydrophobic sequence (or"blob") that contains the SNP. Disease association is found to vary by more than sixfoldas a function of contiguous hydrophobicity parameters, suggesting utility as a prior foridentifying causal variation. We further find signatures of differential selective constrainton hydrophobic blobs and that SNPs splitting a long hydrophobic blob or joiningtwo short hydrophobic blobs are particularly likely to be disease associated. Trends arepreserved for both aggregating and nonaggregating proteins, indicating that the role ofcontiguous hydrophobicity extends well beyond aggregation risk.

Subject(s)

Exome , Genome, Human , Amino Acids/chemistry , Exome/genetics , Genome, Human/genetics , Humans , Hydrophobic and Hydrophilic Interactions , Proteins/chemistry

Sequence specificity despite intrinsic disorder: How a disease-associated Val/Met polymorphism rearranges tertiary interactions in a long disordered protein.

Lohia, Ruchi; Salari, Reza; Brannigan, Grace.

PLoS Comput Biol ; 15(10): e1007390, 2019 10.

Article in English | MEDLINE | ID: mdl-31626641

ABSTRACT

The role of electrostatic interactions and mutations that change charge states in intrinsically disordered proteins (IDPs) is well-established, but many disease-associated mutations in IDPs are charge-neutral. The Val66Met single nucleotide polymorphism (SNP) in precursor brain-derived neurotrophic factor (BDNF) is one of the earliest SNPs to be associated with neuropsychiatric disorders, and the underlying molecular mechanism is unknown. Here we report on over 250 µs of fully-atomistic, explicit solvent, temperature replica-exchange molecular dynamics (MD) simulations of the 91 residue BDNF prodomain, for both the V66 and M66 sequence. The simulations were able to correctly reproduce the location of both local and non-local secondary structure changes due to the Val66Met mutation, when compared with NMR spectroscopy. We find that the change in local structure is mediated via entropic and sequence specific effects. We developed a hierarchical sequence-based framework for analysis and conceptualization, which first identifies "blobs" of 4-15 residues representing local globular regions or linkers. We use this framework within a novel test for enrichment of higher-order (tertiary) structure in disordered proteins; the size and shape of each blob is extracted from MD simulation of the real protein (RP), and used to parameterize a self-avoiding heterogenous polymer (SAHP). The SAHP version of the BDNF prodomain suggested a protein segmented into three regions, with a central long, highly disordered polyampholyte linker separating two globular regions. This effective segmentation was also observed in full simulations of the RP, but the Val66Met substitution significantly increased interactions across the linker, as well as the number of participating residues. The Val66Met substitution replaces ß-bridging between V66 and V94 (on either side of the linker) with specific side-chain interactions between M66 and M95. The protein backbone in the vicinity of M95 is then free to form ß-bridges with residues 31-41 near the N-terminus, which condenses the protein. A significant role for Met/Met interactions is consistent with previously-observed non-local effects of the Val66Met SNP, as well as established interactions between the Met66 sequence and a Met-rich receptor that initiates neuronal growth cone retraction.

Subject(s)

Brain-Derived Neurotrophic Factor/genetics , Intrinsically Disordered Proteins/genetics , Protein Structure, Tertiary/genetics , Alleles , Brain-Derived Neurotrophic Factor/physiology , Gene Frequency/genetics , Genotype , Humans , Intrinsically Disordered Proteins/metabolism , Methionine , Molecular Dynamics Simulation/statistics & numerical data , Polymorphism, Single Nucleotide/genetics , Protein Precursors , Protein Structure, Tertiary/physiology , Substrate Specificity/genetics , Valine

A Streamlined, General Approach for Computing Ligand Binding Free Energies and Its Application to GPCR-Bound Cholesterol.

Salari, Reza; Joseph, Thomas; Lohia, Ruchi; Hénin, Jérôme; Brannigan, Grace.

J Chem Theory Comput ; 14(12): 6560-6573, 2018 Dec 11.

Article in English | MEDLINE | ID: mdl-30358394

ABSTRACT

The theory of receptor-ligand binding equilibria has long been well-established in biochemistry, and was primarily constructed to describe dilute aqueous solutions. Accordingly, few computational approaches have been developed for making quantitative predictions of binding probabilities in environments other than dilute isotropic solution. Existing techniques, ranging from simple automated docking procedures to sophisticated thermodynamics-based methods, have been developed with soluble proteins in mind. Biologically and pharmacologically relevant protein-ligand interactions often occur in complex environments, including lamellar phases like membranes and crowded, nondilute solutions. Here, we revisit the theoretical bases of ligand binding equilibria, avoiding overly specific assumptions that are nearly always made when describing receptor-ligand binding. Building on this formalism, we extend the asymptotically exact Alchemical Free Energy Perturbation technique to quantifying occupancies of sites on proteins in a complex bulk, including phase-separated, anisotropic, or nondilute solutions, using a thermodynamically consistent and easily generalized approach that resolves several ambiguities of current frameworks. To incorporate the complex bulk without overcomplicating the overall thermodynamic cycle, we simplify the common approach for ligand restraints by using a single distance-from-bound-configuration (DBC) ligand restraint during AFEP decoupling from protein. DBC restraints should be generalizable to binding modes of most small molecules, even those with strong orientational dependence. We apply this approach to compute the likelihood that membrane cholesterol binds to known crystallographic sites on three GPCRs (ß2-adrenergic, 5HT-2B, and µ-opioid) at a range of concentrations. Nonideality of cholesterol in a binary cholesterol:phosphatidylcholine (POPC) bilayer is characterized and consistently incorporated into the interpretation. We find that the three sites exhibit very different affinities for cholesterol: The site on the adrenergic receptor is predicted to be high affinity, with 50% occupancy for 1:109 CHOL:POPC mixtures. The sites on the 5HT-2B and µ-opioid receptor are predicted to be lower affinity, with 50% occupancy for 1:103 CHOL:POPC and 1:102 CHOL:POPC, respectively. These results could not have been predicted from the crystal structures alone.

Subject(s)

Cholesterol/metabolism , Molecular Dynamics Simulation , Receptors, G-Protein-Coupled/metabolism , Ligands , Lipid Bilayers/chemistry , Lipid Bilayers/metabolism , Protein Binding , Protein Conformation , Receptors, G-Protein-Coupled/chemistry

Comparative metagenome analysis of an Alaskan glacier.

Choudhari, Sulbha; Lohia, Ruchi; Grigoriev, Andrey.

J Bioinform Comput Biol ; 12(2): 1441003, 2014 Apr.

Article in English | MEDLINE | ID: mdl-24712530

ABSTRACT

The temperature in the Arctic region has been increasing in the recent past accompanied by melting of its glaciers. We took a snapshot of the current microbial inhabitation of an Alaskan glacier (which can be considered as one of the simplest possible ecosystems) by using metagenomic sequencing of 16S rRNA recovered from ice/snow samples. Somewhat contrary to our expectations and earlier estimates, a rich and diverse microbial population of more than 2,500 species was revealed including several species of Archaea that has been identified for the first time in the glaciers of the Northern hemisphere. The most prominent bacterial groups found were Proteobacteria, Bacteroidetes, and Firmicutes. Firmicutes were not reported in large numbers in a previously studied Alpine glacier but were dominant in an Antarctic subglacial lake. Representatives of Cyanobacteria, Actinobacteria and Planctomycetes were among the most numerous, likely reflecting the dependence of the ecosystem on the energy obtained through photosynthesis and close links with the microbial community of the soil. Principal component analysis (PCA) of nucleotide word frequency revealed distinct sequence clusters for different taxonomic groups in the Alaskan glacier community and separate clusters for the glacial communities from other regions of the world. Comparative analysis of the community composition and bacterial diversity present in the Byron glacier in Alaska with other environments showed larger overlap with an Arctic soil than with a high Arctic lake, indicating patterns of community exchange and suggesting that these bacteria may play an important role in soil development during glacial retreat.

Subject(s)

Bacteria/genetics , Bacteria/isolation & purification , Ice Cover/microbiology , Metagenome/genetics , Microbial Consortia/genetics , RNA, Ribosomal, 16S/genetics , Alaska , Bacteria/classification , Base Sequence , Molecular Sequence Data

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL