Search | Nursing VHL Search Portal

1.

DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options.

Basu, Sushmita; Zhao, Bi; Biró, Bálint; Faraggi, Eshel; Gsponer, Jörg; Hu, Gang; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Söding, Johannes; Steinegger, Martin; Wang, Duolin; Wang, Kui; Xu, Dong; Zhang, Jian; Kurgan, Lukasz.

Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37933852

ABSTRACT

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Subject(s)

Amino Acids , Proteome , Proteome/chemistry , Databases, Factual

2.

DescribePROT: database of amino acid-level protein structure and function predictions.

Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J; Dunker, A Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; Söding, Johannes; Steinegger, Martin; Zhou, Yaoqi; Kurgan, Lukasz.

Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33119734

ABSTRACT

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Subject(s)

Amino Acids/chemistry , Databases, Protein , Genome , Proteins/genetics , Proteome/genetics , Software , Amino Acid Sequence , Amino Acids/metabolism , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Binding Sites , Conserved Sequence , Fungi/genetics , Fungi/metabolism , Humans , Internet , Plants/genetics , Plants/metabolism , Prokaryotic Cells/metabolism , Protein Binding , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification , Proteins/metabolism , Proteome/chemistry , Proteome/metabolism , Sequence Analysis, Protein , Viruses/genetics , Viruses/metabolism

3.

LIST-S2: taxonomy based sorting of deleterious missense mutations across species.

Malhis, Nawar; Jacobson, Matthew; Jones, Steven J M; Gsponer, Jörg.

Nucleic Acids Res ; 48(W1): W154-W161, 2020 07 02.

Article in English | MEDLINE | ID: mdl-32352516

ABSTRACT

The separation of deleterious from benign mutations remains a key challenge in the interpretation of genomic data. Computational methods used to sort mutations based on their potential deleteriousness rely largely on conservation measures derived from sequence alignments. Here, we introduce LIST-S2, a successor to our previously developed approach LIST, which aims to exploit local sequence identity and taxonomy distances in quantifying the conservation of human protein sequences. Unlike its predecessor, LIST-S2 is not limited to human sequences but can assess conservation and make predictions for sequences from any organism. Moreover, we provide a web-tool and downloadable software to compute and visualize the deleteriousness of mutations in user-provided sequences. This web-tool contains an HTML interface and a RESTful API to submit and manage sequences as well as a browsable set of precomputed predictions for a large number of UniProtKB protein sequences of common taxa. LIST-S2 is available at: https://list-s2.msl.ubc.ca/.

Subject(s)

Mutation, Missense , Software , Animals , Germ-Line Mutation , Humans , Neoplasms/genetics , Sequence Analysis, Protein

4.

Computational Disorder Analysis in Ethylene Response Factors Uncovers Binding Motifs Critical to Their Diverse Functions.

Sun, Xiaolin; Malhis, Nawar; Zhao, Bi; Xue, Bin; Gsponer, Joerg; Rikkerink, Erik H A.

Int J Mol Sci ; 21(1)2019 Dec 20.

Article in English | MEDLINE | ID: mdl-31861935

ABSTRACT

APETALA2/ETHYLENE RESPONSE FACTOR transcription factors (AP2/ERFs) play crucial roles in adaptation to stresses such as those caused by pathogens, wounding and cold. Although their name suggests a specific role in ethylene signalling, some ERF members also co-ordinate signals regulated by other key plant stress hormones such as jasmonate, abscisic acid and salicylate. We analysed a set of ERF proteins from three divergent plant species for intrinsically disorder regions containing conserved segments involved in protein-protein interaction known as Molecular Recognition Features (MoRFs). Then we correlated the MoRFs identified with a number of known functional features where these could be identified. Our analyses suggest that MoRFs, with plasticity in their disordered surroundings, are highly functional and may have been shuffled between related protein families driven by selection. A particularly important role may be played by the alpha helical component of the structured DNA binding domain to permit specificity. We also present examples of computationally identified MoRFs that have no known function and provide a valuable conceptual framework to link both disordered and ordered structural features within this family to diverse function.

Subject(s)

Ethylenes/metabolism , Plant Growth Regulators/metabolism , Plant Proteins/metabolism , Plants/metabolism , Transcription Factors/metabolism , Amino Acid Sequence , Gene Expression Regulation, Plant , Models, Molecular , Phylogeny , Plant Proteins/chemistry , Plant Proteins/genetics , Plants/chemistry , Plants/genetics , Protein Interaction Domains and Motifs , Protein Interaction Maps , Stress, Physiological , Transcription Factors/chemistry , Transcription Factors/genetics

5.

MoRFchibi SYSTEM: software tools for the identification of MoRFs in protein sequences.

Malhis, Nawar; Jacobson, Matthew; Gsponer, Jörg.

Nucleic Acids Res ; 44(W1): W488-93, 2016 Jul 08.

Article in English | MEDLINE | ID: mdl-27174932

ABSTRACT

Molecular recognition features, MoRFs, are short segments within longer disordered protein regions that bind to globular protein domains in a process known as disorder-to-order transition. MoRFs have been found to play a significant role in signaling and regulatory processes in cells. High-confidence computational identification of MoRFs remains an important challenge. In this work, we introduce MoRFchibi SYSTEM that contains three MoRF predictors: MoRFCHiBi, a basic predictor best suited as a component in other applications, MoRFCHiBi_ Light, ideal for high-throughput predictions and MoRFCHiBi_ Web, slower than the other two but best for high accuracy predictions. Results show that MoRFchibi SYSTEM provides more than double the precision of other predictors. MoRFchibi SYSTEM is available in three different forms: as HTML web server, RESTful web server and downloadable software at: http://www.chibi.ubc.ca/faculty/joerg-gsponer/gsponer-lab/software/morf_chibi/.

Subject(s)

Amino Acid Sequence , Internet , Proteins/chemistry , Proteins/metabolism , Software , Benchmarking , CD3 Complex/chemistry , CD3 Complex/metabolism , Datasets as Topic , High-Throughput Screening Assays , Humans , Protein Binding

6.

Computational identification of MoRFs in protein sequences.

Malhis, Nawar; Gsponer, Jörg.

Bioinformatics ; 31(11): 1738-44, 2015 Jun 01.

Article in English | MEDLINE | ID: mdl-25637562

ABSTRACT

MOTIVATION: Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is the binding of molecular recognition features (MoRFs) to globular protein domains in a process known as a disorder-to-order transition. Predicting the location of MoRFs in protein sequences with high accuracy remains an important computational challenge. METHOD: In this study, we introduce MoRFCHiBi, a new computational approach for fast and accurate prediction of MoRFs in protein sequences. MoRFCHiBi combines the outcomes of two support vector machine (SVM) models that take advantage of two different kernels with high noise tolerance. The first, SVMS, is designed to extract maximal information from the general contrast in amino acid compositions between MoRFs, their surrounding regions (Flanks), and the remainders of the sequences. The second, SVMT, is used to identify similarities between regions in a query sequence and MoRFs of the training set. RESULTS: We evaluated the performance of our predictor by comparing its results with those of two currently available MoRF predictors, MoRFpred and ANCHOR. Using three test sets that have previously been collected and used to evaluate MoRFpred and ANCHOR, we demonstrate that MoRFCHiBi outperforms the other predictors with respect to different evaluation metrics. In addition, MoRFCHiBi is downloadable and fast, which makes it useful as a component in other computational prediction tools. AVAILABILITY AND IMPLEMENTATION: http://www.chibi.ubc.ca/morf/.

Subject(s)

Intrinsically Disordered Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Algorithms , Amino Acids , Computational Biology/methods , Protein Structure, Tertiary , Support Vector Machine

7.

Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins.

Kurgan, Lukasz; Hu, Gang; Wang, Kui; Ghadermarzi, Sina; Zhao, Bi; Malhis, Nawar; Erdos, Gábor; Gsponer, Jörg; Uversky, Vladimir N; Dosztányi, Zsuzsanna.

Nat Protoc ; 18(11): 3157-3172, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37740110

ABSTRACT

Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.

Subject(s)

Computational Biology , Intrinsically Disordered Proteins , Computational Biology/methods , Databases, Protein , Intrinsically Disordered Proteins/chemistry , Amino Acid Sequence

8.

High quality SNP calling using Illumina data at shallow coverage.

Malhis, Nawar; Jones, Steven J M.

Bioinformatics ; 26(8): 1029-35, 2010 Apr 15.

Article in English | MEDLINE | ID: mdl-20190250

ABSTRACT

MOTIVATION: Detection of single nucleotide polymorphisms (SNPs) has been a major application in processing second generation sequencing (SGS) data. In principle, SNPs are called on single base differences between a reference genome and a sequence generated from SGS short reads of a sample genome. However, this exercise is far from trivial; several parameters related to sequencing quality, and/or reference genome properties, play essential effect on the accuracy of called SNPs especially at shallow coverage data. In this work, we present Slider II, an alignment and SNP calling approach that demonstrates improved algorithmic approaches enabling larger number of called SNPs with lower false positive rate. In addition to the regular alignment and SNP calling, as an optional feature, Slider II is capable of utilizing information about known SNPs of a target genome, as priors, in the alignment and SNPs calling to enhance it's capability of detecting these known SNPs and novel SNPs and mutations in their vicinity.

Subject(s)

Algorithms , Polymorphism, Single Nucleotide , Databases, Genetic , Genome , Sequence Alignment/methods , Sequence Analysis, DNA

9.

Genomic sequence of a mutant strain of Caenorhabditis elegans with an altered recombination pattern.

Rose, Ann M; O'Neil, Nigel J; Bilenky, Mikhail; Butterfield, Yaron S; Malhis, Nawar; Flibotte, Stephane; Jones, Martin R; Marra, Marco; Baillie, David L; Jones, Steven J M.

BMC Genomics ; 11: 131, 2010 Feb 23.

Article in English | MEDLINE | ID: mdl-20178641

ABSTRACT

BACKGROUND: The original sequencing and annotation of the Caenorhabditis elegans genome along with recent advances in sequencing technology provide an exceptional opportunity for the genomic analysis of wild-type and mutant strains. Using the Illumina Genome Analyzer, we sequenced the entire genome of Rec-1, a strain that alters the distribution of meiotic crossovers without changing the overall frequency. Rec-1 was derived from ethylmethane sulfonate (EMS)-treated strains, one of which had a high level of transposable element mobility. Sequencing of this strain provides an opportunity to examine the consequences on the genome of altering the distribution of meiotic recombination events. RESULTS: Using Illumina sequencing and MAQ software, 83% of the base pair sequence reads were aligned to the reference genome available at Wormbase, providing a 21-fold coverage of the genome. Using the software programs MAQ and Slider, we observed 1124 base pair differences between Rec-1 and the reference genome in Wormbase (WS190), and 441 between the mutagenized Rec-1 (BC313) and the wild-type N2 strain (VC2010). The most frequent base-substitution was G:C to A:T, 141 for the entire genome most of which were on chromosomes I or X, 55 and 31 respectively. With this data removed, no obvious pattern in the distribution of the base differences along the chromosomes was apparent. No major chromosomal rearrangements were observed, but additional insertions of transposable elements were detected. There are 11 extra copies of Tc1, and 8 of Tc2 in the Rec-1 genome, most likely the remains of past high-hopper activity in a progenitor strain. CONCLUSION: Our analysis of high-throughput sequencing was able to detect regions of direct repeat sequences, deletions, insertions of transposable elements, and base pair differences. A subset of sequence alterations affecting coding regions were confirmed by an independent approach using oligo array comparative genome hybridization. The major phenotype of the Rec-1 strain is an alteration in the preferred position of the meiotic recombination event with no other significant phenotypic consequences. In this study, we observed no evidence of a mutator effect at the nucleotide level attributable to the Rec-1 mutation.

Subject(s)

Caenorhabditis elegans/genetics , Genome, Helminth , Recombination, Genetic , Animals , Base Sequence , Comparative Genomic Hybridization , DNA Transposable Elements , DNA, Helminth/genetics , Meiosis , Molecular Sequence Data , Mutagenesis, Insertional , Repetitive Sequences, Nucleic Acid , Sequence Analysis, DNA , Software

10.

Slider--maximum use of probability information for alignment of short sequence reads and SNP detection.

Malhis, Nawar; Butterfield, Yaron S N; Ester, Martin; Jones, Steven J M.

Bioinformatics ; 25(1): 6-13, 2009 Jan 01.

Article in English | MEDLINE | ID: mdl-18974170

ABSTRACT

MOTIVATION: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. RESULTS: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.

Subject(s)

Algorithms , Polymorphism, Single Nucleotide/genetics , Probability , Sequence Alignment/methods , Base Pair Mismatch , Base Sequence , Computational Biology , Databases, Nucleic Acid , Humans , Time Factors

11.

Protein-Protein Interactions Mediated by Intrinsically Disordered Protein Regions Are Enriched in Missense Mutations.

Wong, Eric T C; So, Victor; Guron, Mike; Kuechler, Erich R; Malhis, Nawar; Bui, Jennifer M; Gsponer, Jörg.

Biomolecules ; 10(8)2020 07 24.

Article in English | MEDLINE | ID: mdl-32722039

ABSTRACT

Because proteins are fundamental to most biological processes, many genetic diseases can be traced back to single nucleotide variants (SNVs) that cause changes in protein sequences. However, not all SNVs that result in amino acid substitutions cause disease as each residue is under different structural and functional constraints. Influential studies have shown that protein-protein interaction interfaces are enriched in disease-associated SNVs and depleted in SNVs that are common in the general population. These studies focus primarily on folded (globular) protein domains and overlook the prevalent class of protein interactions mediated by intrinsically disordered regions (IDRs). Therefore, we investigated the enrichment patterns of missense mutation-causing SNVs that are associated with disease and cancer, as well as those present in the healthy population, in structures of IDR-mediated interactions with comparisons to classical globular interactions. When comparing the different categories of interaction interfaces, division of the interface regions into solvent-exposed rim residues and buried core residues reveal distinctive enrichment patterns for the various types of missense mutations. Most notably, we demonstrate a strong enrichment at the interface core of interacting IDRs in disease mutations and its depletion in neutral ones, which supports the view that the disruption of IDR interactions is a mechanism underlying many diseases. Intriguingly, we also found an asymmetry across the IDR interaction interface in the enrichment of certain missense mutation types, which may hint at an increased variant tolerance and urges further investigations of IDR interactions.

Subject(s)

Databases, Protein , Intrinsically Disordered Proteins/genetics , Mutation, Missense , Polymorphism, Single Nucleotide , Algorithms , Humans , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Models, Molecular , Protein Binding , Protein Domains

12.

Improved measures for evolutionary conservation that exploit taxonomy distances.

Malhis, Nawar; Jones, Steven J M; Gsponer, Jörg.

Nat Commun ; 10(1): 1556, 2019 04 05.

Article in English | MEDLINE | ID: mdl-30952844

ABSTRACT

Selective pressures on protein-coding regions that provide fitness advantages can lead to the regions' fixation and conservation in genome duplications and speciation events. Consequently, conservation analyses relying on sequence similarities are exploited by a myriad of applications across all biosciences to identify functionally important protein regions. While very potent, existing conservation measures based on multiple sequence alignments are so pervasive that improvements to solutions of many problems have become incremental. We introduce a new framework for evolutionary conservation with measures that exploit taxonomy distances across species. Results show that our taxonomy-based framework comfortably outperforms existing conservation measures in identifying deleterious variants observed in the human population, including variants located in non-abundant sequence domains such as intrinsically disordered regions. The predictive power of our approach emphasizes that the phenotypic effects of sequence variants can be taxonomy-level specific and thus, conservation needs to be interpreted accordingly.

Subject(s)

Evolution, Molecular , Genetic Variation , Proteins/genetics , Classification/methods , Humans , Proteins/chemistry , Sequence Alignment , Sequence Analysis, Protein

13.

Computational Identification of MoRFs in Protein Sequences Using Hierarchical Application of Bayes Rule.

Malhis, Nawar; Wong, Eric T C; Nassar, Roy; Gsponer, Jörg.

PLoS One ; 10(10): e0141603, 2015.

Article in English | MEDLINE | ID: mdl-26517836

ABSTRACT

MOTIVATION: Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is often the binding to globular protein domains via sequence elements known as molecular recognition features (MoRFs). Development of computational tools for the identification of candidate MoRF locations in amino acid sequences is an important task and an area of growing interest. Given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we introduce MoRFCHiBi_Web, which predicts MoRF locations in protein sequences with higher accuracy compared to current MoRF predictors. METHODS: Three distinct and largely independent property scores are computed with component predictors and then combined to generate the final MoRF propensity scores. The first score reflects the likelihood of sequence windows to harbour MoRFs and is based on amino acid composition and sequence similarity information. It is generated by MoRFCHiBi using small windows of up to 40 residues in size. The second score identifies long stretches of protein disorder and is generated by ESpritz with the DisProt option. Lastly, the third score reflects residue conservation and is assembled from PSSM files generated by PSI-BLAST. These propensity scores are processed and then hierarchically combined using Bayes rule to generate the final MoRFCHiBi_Web predictions. RESULTS: MoRFCHiBi_Web was tested on three datasets. Results show that MoRFCHiBi_Web outperforms previously developed predictors by generating less than half the false positive rate for the same true positive rate at practical threshold values. This level of accuracy paired with its relatively high processing speed makes MoRFCHiBi_Web a practical tool for MoRF prediction. AVAILABILITY: http://morf.chibi.ubc.ca:8080/morf/.

Subject(s)

Computational Biology/methods , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Bayes Theorem , Databases, Protein , Humans , Propensity Score , Protein Structure, Tertiary , Proteins/metabolism , Sequence Homology, Amino Acid

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL