Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
1.
Clin Trials ; 21(1): 51-66, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-37937606

RESUMO

Numerous successful gene-targeted therapies are arising for the treatment of a variety of rare diseases. At the same time, current treatment options for neurofibromatosis 1 and schwannomatosis are limited and do not directly address loss of gene/protein function. In addition, treatments have mostly focused on symptomatic tumors, but have failed to address multisystem involvement in these conditions. Gene-targeted therapies hold promise to address these limitations. However, despite intense interest over decades, multiple preclinical and clinical issues need to be resolved before they become a reality. The optimal approaches to gene-, mRNA-, or protein restoration and to delivery to the appropriate cell types remain elusive. Preclinical models that recapitulate manifestations of neurofibromatosis 1 and schwannomatosis need to be refined. The development of validated assays for measuring neurofibromin and merlin activity in animal and human tissues will be critical for early-stage trials, as will the selection of appropriate patients, based on their individual genotypes and risk/benefit balance. Once the safety of gene-targeted therapy for symptomatic tumors has been established, the possibility of addressing a wide range of symptoms, including non-tumor manifestations, should be explored. As preclinical efforts are underway, it will be essential to educate both clinicians and those affected by neurofibromatosis 1/schwannomatosis about the risks and benefits of gene-targeted therapy for these conditions.


Assuntos
Neurilemoma , Neurofibromatoses , Neurofibromatose 1 , Neurofibromatose 2 , Neoplasias Cutâneas , Animais , Humanos , Neurofibromatose 1/genética , Neurofibromatose 1/terapia , Neurofibromatose 2/diagnóstico , Neurofibromatose 2/genética , Neurofibromatose 2/patologia , Neurofibromatoses/genética , Neurofibromatoses/terapia , Neurofibromatoses/diagnóstico , Neurilemoma/genética , Neurilemoma/terapia , Neurilemoma/diagnóstico
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36617463

RESUMO

DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.


Assuntos
Inteligência Artificial , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Transcriptoma , Análise de Sequência de RNA/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/genética
3.
Methods Mol Biol ; 2499: 205-219, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696083

RESUMO

Among various types of protein post-translational modifications (PTMs), lysine PTMs play an important role in regulating a wide range of functions and biological processes. Due to the generation and accumulation of enormous amount of protein sequence data by ongoing whole-genome sequencing projects, systematic identification of different types of lysine PTM substrates and their specific PTM sites in the entire proteome is increasingly important and has therefore received much attention. Accordingly, a variety of computational methods for lysine PTM identification have been developed based on the combination of various handcrafted sequence features and machine-learning techniques. In this chapter, we first briefly review existing computational methods for lysine PTM identification and then introduce a recently developed deep learning-based method, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs). Specifically, MUscADEL employs bidirectional long short-term memory (BiLSTM) recurrent neural networks and is capable of predicting eight major types of lysine PTMs in both the human and mouse proteomes. The web server of MUscADEL is publicly available at http://muscadel.erc.monash.edu/ for the research community to use.


Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Animais , Lisina/metabolismo , Aprendizado de Máquina , Camundongos , Proteoma/metabolismo
4.
Mol Ther Nucleic Acids ; 28: 261-278, 2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35433111

RESUMO

We investigated the feasibility of utilizing an exon-skipping approach as a genotype-dependent therapeutic for neurofibromatosis type 1 (NF1) by determining which NF1 exons might be skipped while maintaining neurofibromin protein expression and GTPase activating protein (GAP)-related domain (GRD) function. Initial in silico analysis predicted exons that can be skipped with minimal loss of neurofibromin function, which was confirmed by in vitro assessments utilizing an Nf1 cDNA-based functional screening system. Skipping of exons 17 or 52 fit our criteria, as minimal effects on protein expression and GRD activity were noted. Antisense phosphorodiamidate morpholino oligomers (PMOs) were utilized to skip exon 17 in human cell lines with patient-specific pathogenic variants in exon 17, c.1885G>A, and c.1929delG. PMOs restored functional neurofibromin expression. To determine the in vivo significance of exon 17 skipping, we generated a homozygous deletion of exon 17 in a novel mouse model. Mice were viable and exhibited a normal lifespan. Initial studies did not reveal the presence of tumor development; however, altered nesting behavior and systemic lymphoid hyperplasia was noted in peripheral lymphoid organs. Alterations in T and B cell frequencies in the thymus and spleen were identified. Hence, exon skipping should be further investigated as a therapeutic approach for NF1 patients with pathogenic variants in exon 17, as homozygous deletion of exon 17 is consistent with at least partial function of neurofibromin.

5.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34729589

RESUMO

Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Aprendizado de Máquina Supervisionado
6.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33774670

RESUMO

Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina , Proteínas Citotóxicas Formadoras de Poros/farmacologia , Bactérias/classificação , Bactérias/efeitos dos fármacos , Biofilmes/efeitos dos fármacos , Biofilmes/crescimento & desenvolvimento , Bases de Dados Factuais , Fungos/classificação , Fungos/efeitos dos fármacos , Proteínas Citotóxicas Formadoras de Poros/classificação , Proteínas Citotóxicas Formadoras de Poros/metabolismo , Máquina de Vetores de Suporte , Vírus/efeitos dos fármacos
7.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32599617

RESUMO

Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user's viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.


Assuntos
Bactérias , Proteínas de Bactérias/genética , Bases de Dados de Proteínas , Aprendizado Profundo , Genoma Bacteriano , Fatores de Virulência/genética , Bactérias/genética , Bactérias/patogenicidade
8.
Front Big Data ; 4: 727216, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35118375

RESUMO

BACKGROUND: Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identify SSRs. However, the sequenced genome often misses several highly repetitive regions. Moreover, many non-model species have no entire genomes. With the recent advances of next-generation sequencing (NGS) techniques, large-scale sequence reads for any species can be rapidly generated using NGS. In this context, a number of methods have been proposed to identify thousands of SSR loci within large amounts of reads for non-model species. While the most commonly used NGS platforms (e.g., Illumina platform) on the market generally provide short paired-end reads, merging overlapping paired-end reads has become a common way prior to the identification of SSR loci. This has posed a big data analysis challenge for traditional stand-alone tools to merge short read pairs and identify SSRs from large-scale data. RESULTS: In this study, we present a new Hadoop-based software program, termed BigFiRSt, to address this problem using cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. BigFLASH and BigPERF address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can dramatically reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data. CONCLUSIONS: The excellent performance of BigFiRSt mainly resorts to the Big Data Hadoop technology to merge read pairs and mine SSRs in parallel and distributed computing on clusters. We anticipate BigFiRSt will be a valuable tool in the coming biological Big Data era.

9.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33212503

RESUMO

Beta-lactamases (BLs) are enzymes localized in the periplasmic space of bacterial pathogens, where they confer resistance to beta-lactam antibiotics. Experimental identification of BLs is costly yet crucial to understand beta-lactam resistance mechanisms. To address this issue, we present DeepBL, a deep learning-based approach by incorporating sequence-derived features to enable high-throughput prediction of BLs. Specifically, DeepBL is implemented based on the Small VGGNet architecture and the TensorFlow deep learning library. Furthermore, the performance of DeepBL models is investigated in relation to the sequence redundancy level and negative sample selection in the benchmark dataset. The models are trained on datasets of varying sequence redundancy thresholds, and the model performance is evaluated by extensive benchmarking tests. Using the optimized DeepBL model, we perform proteome-wide screening for all reviewed bacterium protein sequences available from the UniProt database. These results are freely accessible at the DeepBL webserver at http://deepbl.erc.monash.edu.au/.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Aprendizado Profundo , Proteoma , Software , beta-Lactamases/genética
10.
Nucleic Acids Res ; 49(D1): D651-D659, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33084862

RESUMO

Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or directly into neighboring cells. These substrates are proteins that function to promote bacterial survival: by facilitating nutrient collection, disabling competitor species or, for pathogens, to disable host defenses. Following a rapid development of computational techniques, a growing number of substrates have been discovered and subsequently validated by wet lab experiments. To date, several online databases have been developed to catalogue these substrates but they have limited user options for in-depth analysis, and typically focus on a single type of secreted substrate. We therefore developed a universal platform, BastionHub, that incorporates extensive functional modules to facilitate substrate analysis and integrates the five major Gram-negative secreted substrate types (i.e. from types I-IV and VI secretion systems). To our knowledge, BastionHub is not only the most comprehensive online database available, it is also the first to incorporate substrates secreted by type I or type II secretion systems. By providing the most up-to-date details of secreted substrates and state-of-the-art prediction and visualized relationship analysis tools, BastionHub will be an important platform that can assist biologists in uncovering novel substrates and formulating new hypotheses. BastionHub is freely available at http://bastionhub.erc.monash.edu/.


Assuntos
Bases de Dados como Assunto , Bactérias Gram-Negativas/metabolismo , Curadoria de Dados , Anotação de Sequência Molecular , Especificidade por Substrato
11.
BMC Bioinformatics ; 21(1): 493, 2020 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-33129275

RESUMO

BACKGROUND: Cytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine-receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases-notably autoimmune, inflammatory and infectious diseases-and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. "Gold Standard" negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection. RESULTS: We used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance. CONCLUSIONS: A comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections-with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.


Assuntos
Receptores de Citocinas/metabolismo , Algoritmos , Humanos , Aprendizado de Máquina , Matrizes de Pontuação de Posição Específica , Reprodutibilidade dos Testes
12.
J Bioinform Comput Biol ; 18(4): 2050018, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32501138

RESUMO

Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and cell metabolism in prokaryotes such as bacteria. While evidence has emerged that protein histidine phosphorylation also occurs in more complex organisms, its role in mammalian cells has remained largely uncharted. Thus, it is highly desirable to develop computational tools that are able to identify histidine phosphorylation sites. Result: Here, we introduce PROSPECT that enables fast and accurate prediction of proteome-wide histidine phosphorylation substrates and sites. Our tool is based on a hybrid method that integrates the outputs of two convolutional neural network (CNN)-based classifiers and a random forest-based classifier. Three features, including the one-of-K coding, enhanced grouped amino acids content (EGAAC) and composition of k-spaced amino acid group pairs (CKSAAGP) encoding, were taken as the input to three classifiers, respectively. Our results show that it is able to accurately predict histidine phosphorylation sites from sequence information. Our PROSPECT web server is user-friendly and publicly available at http://PROSPECT.erc.monash.edu/. Conclusions: PROSPECT is superior than other pHis predictors in both the running speed and prediction accuracy and we anticipate that the PROSPECT webserver will become a popular tool for identifying the pHis sites in bacteria.


Assuntos
Histidina/metabolismo , Proteoma/metabolismo , Software , Biologia Computacional/métodos , Proteínas de Escherichia coli/metabolismo , Redes Neurais de Computação , Fosforilação
13.
Bioinformatics ; 36(15): 4276-4282, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32426818

RESUMO

MOTIVATION: Different from traditional linear RNAs (containing 5' and 3' ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. RESULTS: For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA Circular , Proteínas de Ligação a RNA , Teorema de Bayes , Sítios de Ligação , Redes Neurais de Computação , Proteínas de Ligação a RNA/metabolismo
14.
Genomics Proteomics Bioinformatics ; 18(1): 52-64, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32413515

RESUMO

Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes. Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method, which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field (CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases. Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.


Assuntos
Biologia Computacional/métodos , Peptídeo Hidrolases/metabolismo , Software , Algoritmos , Benchmarking , Domínio Catalítico , Humanos , Peptídeo Hidrolases/química , Conformação Proteica , Proteólise , Proteoma/metabolismo , Relação Estrutura-Atividade , Especificidade por Substrato
15.
Mol Ther Nucleic Acids ; 20: 739-753, 2020 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-32408052

RESUMO

Significant advances in biotechnology have led to the development of a number of different mutation-directed therapies. Some of these techniques have matured to a level that has allowed testing in clinical trials, but few have made it to approval by drug-regulatory bodies for the treatment of specific diseases. While there are still various hurdles to be overcome, recent success stories have proven the potential power of mutation-directed therapies and have fueled the hope of finding therapeutics for other genetic disorders. In this review, we summarize the state-of-the-art of various therapeutic approaches and assess their applicability to the genetic disorder neurofibromatosis type I (NF1). NF1 is caused by the loss of function of neurofibromin, a tumor suppressor and downregulator of the Ras signaling pathway. The condition is characterized by a variety of phenotypes and includes symptoms such as skin spots, nervous system tumors, skeletal dysplasia, and others. Hence, depending on the patient, therapeutics may need to target different tissues and cell types. While we also discuss the delivery of therapeutics, in particular via viral vectors and nanoparticles, our main focus is on therapeutic techniques that reconstitute functional neurofibromin, most notably cDNA replacement, CRISPR-based DNA repair, RNA repair, antisense oligonucleotide therapeutics including exon skipping, and nonsense suppression.

16.
FEBS Lett ; 594(14): 2213-2226, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32333796

RESUMO

Membrane traffic between secretory and endosomal compartments is vesicle-mediated and must be tightly balanced to maintain a physiological compartment size. Vesicle formation is initiated by guanine nucleotide exchange factors (GEFs) that activate the ARF family of small GTPases. Regulatory mechanisms, including reversible phosphorylation, allow ARF-GEFs to support vesicle formation only at the right time and place in response to cellular needs. Here, we review current knowledge of how the Golgi-specific brefeldin A-resistance factor 1 (GBF1)/brefeldin A-inhibited guanine nucleotide exchange protein (BIG) family of ARF-GEFs is influenced by phosphorylation and use predictive paradigms to propose new regulatory paradigms. We describe a conserved cluster of phosphorylation sites within the N-terminal domains of the GBF1/BIG ARF-GEFs and suggest that these sites may respond to homeostatic signals related to cell growth and division. In the C-terminal region, GBF1 shows phosphorylation sites clustered differently as compared with the similar configuration found in both BIG1 and BIG2. Despite this similarity, BIG1 and BIG2 phosphorylation patterns are divergent in other domains. The different clustering of phosphorylation sites suggests that the nonconserved sites may represent distinct regulatory nodes and specify the function of GBF1, BIG1, and BIG2.


Assuntos
Fatores de Troca do Nucleotídeo Guanina/metabolismo , Animais , Fatores de Troca do Nucleotídeo Guanina/química , Humanos , Fosfoproteínas/química , Fosfoproteínas/metabolismo , Fosforilação
17.
AMIA Annu Symp Proc ; 2020: 1325-1334, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33936509

RESUMO

Recent research in predicting protein secondary structure populations (SSP) based on Nuclear Magnetic Resonance (NMR) chemical shifts has helped quantitatively characterise the structural conformational properties of intrinsically disordered proteins and regions (IDP/IDR). Different from protein secondary structure (SS) prediction, the SSP prediction assumes a dynamic assignment of secondary structures that seem correlate with disordered states. In this study, we designed a single-task deep learning framework to predict IDP/IDR and SSP respectively; and multitask deep learning frameworks to allow quantitative predictions of IDP/IDR evidenced by the simultaneously predicted SSP. According to independent test results, single-task deep learning models improve the prediction performance of shallow models for SSP and IDP/IDR. Also, the prediction performance was further improved for IDP/IDR prediction when SSP prediction was simultaneously predicted in multitask models. With p53 as a use case, we demonstrate how predicted SSP is used to explain the IDP/IDR predictions for each functional region.


Assuntos
Aprendizado Profundo , Proteínas Intrinsicamente Desordenadas/química , Estrutura Secundária de Proteína
18.
Brief Bioinform ; 21(4): 1119-1135, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31204427

RESUMO

Human leukocyte antigen class I (HLA-I) molecules are encoded by major histocompatibility complex (MHC) class I loci in humans. The binding and interaction between HLA-I molecules and intracellular peptides derived from a variety of proteolytic mechanisms play a crucial role in subsequent T-cell recognition of target cells and the specificity of the immune response. In this context, tools that predict the likelihood for a peptide to bind to specific HLA class I allotypes are important for selecting the most promising antigenic targets for immunotherapy. In this article, we comprehensively review a variety of currently available tools for predicting the binding of peptides to a selection of HLA-I allomorphs. Specifically, we compare their calculation methods for the prediction score, employed algorithms, evaluation strategies and software functionalities. In addition, we have evaluated the prediction performance of the reviewed tools based on an independent validation data set, containing 21 101 experimentally verified ligands across 19 HLA-I allotypes. The benchmarking results show that MixMHCpred 2.0.1 achieves the best performance for predicting peptides binding to most of the HLA-I allomorphs studied, while NetMHCpan 4.0 and NetMHCcons 1.1 outperform the other machine learning-based and consensus-based tools, respectively. Importantly, it should be noted that a peptide predicted with a higher binding score for a specific HLA allotype does not necessarily imply it will be immunogenic. That said, peptide-binding predictors are still very useful in that they can help to significantly reduce the large number of epitope candidates that need to be experimentally verified. Several other factors, including susceptibility to proteasome cleavage, peptide transport into the endoplasmic reticulum and T-cell receptor repertoire, also contribute to the immunogenicity of peptide antigens, and some of them can be considered by some predictors. Therefore, integrating features derived from these additional factors together with HLA-binding properties by using machine-learning algorithms may increase the prediction accuracy of immunogenic peptides. As such, we anticipate that this review and benchmarking survey will assist researchers in selecting appropriate prediction tools that best suit their purposes and provide useful guidelines for the development of improved antigen predictors in the future.


Assuntos
Biologia Computacional/métodos , Antígenos de Histocompatibilidade Classe I/metabolismo , Algoritmos , Conjuntos de Dados como Assunto , Antígenos de Histocompatibilidade Classe I/química , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
19.
Brief Bioinform ; 21(3): 1069-1079, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-31161204

RESUMO

Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.


Assuntos
Bases de Dados de Proteínas , Mutação , Processamento de Proteína Pós-Traducional , Proteínas/química , Conformação Proteica
20.
Bioinformatics ; 36(3): 704-712, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31393553

RESUMO

MOTIVATION: Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, 'non-classical' secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of 'non-classical' secreted proteins from sequence data. RESULTS: In this work, we first constructed a high-quality dataset of experimentally verified 'non-classical' secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew's correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users' demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors. AVAILABILITY AND IMPLEMENTATION: http://pengaroo.erc.monash.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Aprendizado de Máquina , Biologia Computacional , Peptídeos , Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...