Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Comput Struct Biotechnol J ; 21: 1678-1687, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36890882

RESUMO

Immunopeptidomics has made tremendous contributions to our understanding of antigen processing and presentation, by identifying and quantifying antigenic peptides presented on the cell surface by Major Histocompatibility Complex (MHC) molecules. Large and complex immunopeptidomics datasets can now be routinely generated using Liquid Chromatography-Mass Spectrometry techniques. The analysis of this data - often consisting of multiple replicates/conditions - rarely follows a standard data processing pipeline, hindering the reproducibility and depth of analysis of immunopeptidomic data. Here, we present Immunolyser, an automated pipeline designed to facilitate computational analysis of immunopeptidomic data with a minimal initial setup. Immunolyser brings together routine analyses, including peptide length distribution, peptide motif analysis, sequence clustering, peptide-MHC binding affinity prediction, and source protein analysis. Immunolyser provides a user-friendly and interactive interface via its webserver and is freely available for academic purposes at https://immunolyser.erc.monash.edu/. The open-access source code can be downloaded at our GitHub repository: https://github.com/prmunday/Immunolyser. We anticipate that Immunolyser will serve as a prominent computational pipeline to facilitate effortless and reproducible analysis of immunopeptidomic data.

2.
Comput Struct Biotechnol J ; 21: 1272-1282, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36814721

RESUMO

T cells expressing either alpha-beta or gamma-delta T cell receptors (TCR) are critical sentinels of the adaptive immune system, with receptor diversity being essential for protective immunity against a broad array of pathogens and agents. Programs available to profile TCR clonotypic signatures can be limiting for users with no coding expertise. Current analytical pipelines can be inefficient due to manual processing steps, open to data entry errors and have multiple analytical tools with unique inputs that require coding expertise. Here we present a bespoke webtool designed for users irrespective of coding expertise, coined 'TCR_Explore', enabling analysis either derived via Sanger sequencing or next generation sequencing (NGS) platforms. Further, TCR_Explore incorporates automated quality control steps for Sanger sequencing. The creation of flexible and publication ready figures are enabled for different sequencing platforms following universal conversion to the TCR_Explore file format. TCR_Explore will enhance a user's capacity to undertake in-depth TCR repertoire analysis of both new and pre-existing datasets for identification of T cell clonotypes associated with health and disease. The web application is located at https://tcr-explore.erc.monash.edu for users to interactively explore TCR repertoire datasets.

3.
BMC Bioinformatics ; 23(1): 69, 2022 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-35164667

RESUMO

BACKGROUND: Gene ontology (GO) enrichment analysis is frequently undertaken during exploration of various -omics data sets. Despite the wide array of tools available to biologists to perform this analysis, meaningful visualisation of the overrepresented GO in a manner which is easy to interpret is still lacking. RESULTS: Monash Gene Ontology (MonaGO) is a novel web-based visualisation system that provides an intuitive, interactive and responsive interface for performing GO enrichment analysis and visualising the results. MonaGO supports gene lists as well as GO terms as inputs. Visualisation results can be exported as high-resolution images or restored in new sessions, allowing reproducibility of the analysis. An extensive comparison between MonaGO and 11 state-of-the-art GO enrichment visualisation tools based on 9 features revealed that MonaGO is a unique platform that simultaneously allows interactive visualisation within one single output page, directly accessible through a web browser with customisable display options. CONCLUSION: MonaGO combines dynamic clustering and interactive visualisation as well as customisation options to assist biologists in obtaining meaningful representation of overrepresented GO terms, producing simplified outputs in an unbiased manner. MonaGO will facilitate the interpretation of GO analysis and will assist the biologists into the representation of the results.


Assuntos
Software , Análise por Conglomerados , Ontologia Genética , Probabilidade , Reprodutibilidade dos Testes
4.
Comput Struct Biotechnol J ; 19: 5735-5740, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34745458

RESUMO

Volcano and other analytical plots (e.g., correlation plots, upset plots, and heatmaps) serve as important data visualization methods for transcriptomic and proteomic analyses. Customizable generation of these plots is fundamentally important for a better understanding of dysregulated expression data and is therefore instrumental for the ensuing pathway analysis and biomarker identification. Here, we present an R-based Shiny application, termed ggVolcanoR, to allow for customizable generation and visualization of volcano plots, correlation plots, upset plots, and heatmaps for differential expression datasets, via a user-friendly interactive interface in both local executable version and web-based application without requiring programming expertise. Compared to currently existing packages, ggVolcanoR offers more practical options to optimize the generation of publication-quality volcano and other analytical plots for analyzing and comparing dysregulated genes/proteins across multiple differential expression datasets. In addition, ggVolcanoR provides an option to download the customized list of the filtered dysregulated expression data, which can be directly used as input for downstream pathway analysis. The source code of ggVolcanoR is available at https://github.com/KerryAM-R/ggVolcanoR and the webserver of ggVolcanoR 1.0 has been deployed and is freely available for academic purposes at https://ggvolcanor.erc.monash.edu/.

5.
Int J Mol Sci ; 22(6)2021 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-33803033

RESUMO

Both protease- and reactive oxygen species (ROS)-mediated proteolysis are thought to be key effectors of tissue remodeling. We have previously shown that comparison of amino acid composition can predict the differential susceptibilities of proteins to photo-oxidation. However, predicting protein susceptibility to endogenous proteases remains challenging. Here, we aim to develop bioinformatics tools to (i) predict cleavage site locations (and hence putative protein susceptibilities) and (ii) compare the predicted vulnerabilities of skin proteins to protease- and ROS-mediated proteolysis. The first goal of this study was to experimentally evaluate the ability of existing protease cleavage site prediction models (PROSPER and DeepCleave) to identify experimentally determined MMP9 cleavage sites in two purified proteins and in a complex human dermal fibroblast-derived extracellular matrix (ECM) proteome. We subsequently developed deep bidirectional recurrent neural network (BRNN) models to predict cleavage sites for 14 tissue proteases. The predictions of the new models were tested against experimental datasets and combined with amino acid composition analysis (to predict ultraviolet radiation (UVR)/ROS susceptibility) in a new web app: the Manchester proteome susceptibility calculator (MPSC). The BRNN models performed better in predicting cleavage sites in native dermal ECM proteins than existing models (DeepCleave and PROSPER), and application of MPSC to the skin proteome suggests that: compared with the elastic fiber network, fibrillar collagens may be susceptible primarily to protease-mediated proteolysis. We also identify additional putative targets of oxidative damage (dermatopontin, fibulins and defensins) and protease action (laminins and nidogen). MPSC has the potential to identify potential targets of proteolysis in disparate tissues and disease states.


Assuntos
Aprendizado Profundo , Proteólise , Proteoma/metabolismo , Aminoácidos/metabolismo , Proteínas da Matriz Extracelular/metabolismo , Humanos , Redes Neurais de Computação , Peptídeo Hidrolases/metabolismo , Proteólise/efeitos da radiação , Espécies Reativas de Oxigênio/metabolismo , Reprodutibilidade dos Testes , Software , Raios Ultravioleta
6.
Proteomics ; 21(17-18): e2100036, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33811468

RESUMO

SARS-CoV-2 has caused a significant ongoing pandemic worldwide. A number of studies have examined the T cell mediated immune responses against SARS-CoV-2, identifying potential T cell epitopes derived from the SARS-CoV-2 proteome. Such studies will aid in identifying targets for vaccination and immune monitoring. In this study, we applied tandem mass spectrometry and proteomic techniques to a library of ∼40,000 synthetic peptides, in order to generate a large dataset of SARS-CoV-2 derived peptide MS/MS spectra. On this basis, we built an online knowledgebase, termed virusMS (https://virusms.erc.monash.edu/), to document, annotate and analyse these synthetic peptides and their spectral information. VirusMS incorporates a user-friendly interface to facilitate searching, browsing and downloading the database content. Detailed annotations of the peptides, including experimental information, peptide modifications, predicted peptide-HLA (human leukocyte antigen) binding affinities, and peptide MS/MS spectral data, are provided in virusMS.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Peptídeos , Proteômica , Espectrometria de Massas em Tandem
7.
Front Big Data ; 4: 727216, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35118375

RESUMO

BACKGROUND: Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identify SSRs. However, the sequenced genome often misses several highly repetitive regions. Moreover, many non-model species have no entire genomes. With the recent advances of next-generation sequencing (NGS) techniques, large-scale sequence reads for any species can be rapidly generated using NGS. In this context, a number of methods have been proposed to identify thousands of SSR loci within large amounts of reads for non-model species. While the most commonly used NGS platforms (e.g., Illumina platform) on the market generally provide short paired-end reads, merging overlapping paired-end reads has become a common way prior to the identification of SSR loci. This has posed a big data analysis challenge for traditional stand-alone tools to merge short read pairs and identify SSRs from large-scale data. RESULTS: In this study, we present a new Hadoop-based software program, termed BigFiRSt, to address this problem using cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. BigFLASH and BigPERF address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can dramatically reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data. CONCLUSIONS: The excellent performance of BigFiRSt mainly resorts to the Big Data Hadoop technology to merge read pairs and mine SSRs in parallel and distributed computing on clusters. We anticipate BigFiRSt will be a valuable tool in the coming biological Big Data era.

8.
Genet Med ; 22(11): 1883-1886, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32606442

RESUMO

PURPOSE: To measure the prevalence of medically actionable pathogenic variants (PVs) among a population of healthy elderly individuals. METHODS: We used targeted sequencing to detect pathogenic or likely pathogenic variants in 55 genes associated with autosomal dominant medically actionable conditions, among a population of 13,131 individuals aged 70 or older (mean age 75 years) enrolled in the ASPirin in Reducing Events in the Elderly (ASPREE) trial. Participants had no previous diagnosis or current symptoms of cardiovascular disease, physical disability or dementia, and no current diagnosis of life-threatening cancer. Variant curation followed American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) standards. RESULTS: One in 75 (1.3%) healthy elderly individuals carried a PV. This was lower than rates reported from population-based studies, which have ranged from 1.8% to 3.4%. We detected 20 PV carriers for Lynch syndrome (MSH6/MLH1/MSH2/PMS2) and 13 for familial hypercholesterolemia (LDLR/APOB/PCSK9). Among 7056 female participants, we detected 15 BRCA1/BRCA2 PV carriers (1 in 470 females). We detected 86 carriers of PVs in lower-penetrance genes associated with inherited cardiac disorders. CONCLUSION: Medically actionable PVs are carried in a healthy elderly population. Our findings raise questions about the actionability of lower-penetrance genes, especially when PVs are detected in the absence of symptoms and/or family history of disease.


Assuntos
Neoplasias Colorretais Hereditárias sem Polipose , Pró-Proteína Convertase 9 , Idoso , Neoplasias Colorretais Hereditárias sem Polipose/genética , Feminino , Genes BRCA2 , Predisposição Genética para Doença , Humanos
9.
Bioinformatics ; 36(4): 1057-1065, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31566664

RESUMO

MOTIVATION: Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the 'life and death' cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases' functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. RESULTS: We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. AVAILABILITY AND IMPLEMENTATION: The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Caspases , Metaloproteases , Software , Especificidade por Substrato
10.
Brief Bioinform ; 21(3): 1069-1079, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-31161204

RESUMO

Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.


Assuntos
Bases de Dados de Proteínas , Mutação , Processamento de Proteína Pós-Traducional , Proteínas/química , Conformação Proteica
11.
Brief Bioinform ; 21(3): 1047-1057, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-31067315

RESUMO

With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.


Assuntos
DNA/química , Aprendizado de Máquina , Proteínas/química , RNA/química , Análise de Sequência/métodos , Algoritmos , Internet
12.
Brief Bioinform ; 20(6): 2150-2166, 2019 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30184176

RESUMO

The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.


Assuntos
Benchmarking , Biologia Computacional , Peptídeo Hidrolases/metabolismo , Pesquisa , Algoritmos , Aprendizado de Máquina , Especificidade por Substrato
13.
BMC Genomics ; 19(1): 238, 2018 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-29621972

RESUMO

BACKGROUND: A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. RESULTS: We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. CONCLUSIONS: TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au .


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Sequência de Bases , Sítios de Ligação , Sequência Conservada , DNA/química , DNA/metabolismo , Humanos , Internet , Mesotelina , Camundongos , Motivos de Nucleotídeos , Ratos , Fatores de Transcrição/metabolismo
14.
Bioinformatics ; 33(17): 2756-2758, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-28903538

RESUMO

SUMMARY: Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of descriptors. Here, we present POSSUM ( Po sition- S pecific S coring matrix-based feat u re generator for m achine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM-based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to facilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research. AVAILABILITY AND IMPLEMENTATION: http://possum.erc.monash.edu/ . CONTACT: trevor.lithgow@monash.edu or jiangning.song@monash.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Matrizes de Pontuação de Posição Específica , Análise de Sequência de Proteína/métodos , Software
15.
Sci Rep ; 7: 45509, 2017 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-28358052

RESUMO

Measuring the altered gene expression level and identifying differentially expressed genes/proteins during HIV infection, replication and latency is fundamental for broadening our understanding of the mechanisms of HIV infection and T-cell dysfunction. Such studies are crucial for developing effective strategies for virus eradication from the body. Inspired by the availability and enrichment of gene expression data during HIV infection, replication and latency, in this study, we proposed a novel compendium termed HIVed (HIV expression database; http://hivlatency.erc.monash.edu/) that harbours comprehensive functional annotations of proteins, whose genes have been shown to be dysregulated during HIV infection, replication and latency using different experimental designs and measurements. We manually curated a variety of third-party databases for structural and functional annotations of the protein entries in HIVed. With the goal of benefiting HIV related research, we collected a number of biological annotations for all the entries in HIVed besides their expression profile, including basic protein information, Gene Ontology terms, secondary structure, HIV-1 interaction and pathway information. We hope this comprehensive protein-centric knowledgebase can bridge the gap between the understanding of differentially expressed genes and the functions of their protein products, facilitating the generation of novel hypotheses and treatment strategies to fight against the HIV pandemic.


Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Infecções por HIV/patologia , HIV-1/fisiologia , Latência Viral , Replicação Viral , Infecções por HIV/virologia , Humanos , Anotação de Sequência Molecular
16.
Sci Rep ; 7: 41031, 2017 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-28112271

RESUMO

Bacteria translocate effector molecules to host cells through highly evolved secretion systems. By definition, the function of these effector proteins is to manipulate host cell biology and the sequence, structural and functional annotations of these effector proteins will provide a better understanding of how bacterial secretion systems promote bacterial survival and virulence. Here we developed a knowledgebase, termed SecretEPDB (Bacterial Secreted Effector Protein DataBase), for effector proteins of type III secretion system (T3SS), type IV secretion system (T4SS) and type VI secretion system (T6SS). SecretEPDB provides enriched annotations of the aforementioned three classes of effector proteins by manually extracting and integrating structural and functional information from currently available databases and the literature. The database is conservative and strictly curated to ensure that every effector protein entry is supported by experimental evidence that demonstrates it is secreted by a T3SS, T4SS or T6SS. The annotations of effector proteins documented in SecretEPDB are provided in terms of protein characteristics, protein function, protein secondary structure, Pfam domains, metabolic pathway and evolutionary details. It is our hope that this integrated knowledgebase will serve as a useful resource for biological investigation and the generation of new hypotheses for research efforts aimed at bacterial secretion systems.


Assuntos
Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Bases de Dados Factuais , Sistemas de Secreção Tipo III/metabolismo , Sistemas de Secreção Tipo IV/metabolismo , Sistemas de Secreção Tipo VI/metabolismo , Fatores de Virulência/metabolismo , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Evolução Molecular , Interações Hospedeiro-Patógeno , Internet , Estrutura Secundária de Proteína , Fatores de Virulência/química , Fatores de Virulência/genética
17.
Brief Bioinform ; 18(2): 348-355, 2017 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26984618

RESUMO

There is a clear demand for hands-on bioinformatics training. The development of bioinformatics workshop content is both time-consuming and expensive. Therefore, enabling trainers to develop bioinformatics workshops in a way that facilitates reuse is becoming increasingly important. The most widespread practice for sharing workshop content is through making PDF, PowerPoint and Word documents available online. While this effort is to be commended, such content is usually not so easy to reuse or repurpose and does not capture all the information required for a third party to rerun a workshop. We present an open, collaborative framework for developing and maintaining, reusable and shareable hands-on training workshop content.


Assuntos
Biologia Computacional , Comportamento Cooperativo , Humanos
18.
Brief Bioinform ; 18(3): 537-544, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-27084333

RESUMO

The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP.


Assuntos
Biologia Computacional , Austrália , Sequenciamento de Nucleotídeos em Larga Escala , Universidades
19.
Sci Rep ; 6: 34595, 2016 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-27708373

RESUMO

Glycosylation plays an important role in cell-cell adhesion, ligand-binding and subcellular recognition. Current approaches for predicting protein glycosylation are primarily based on sequence-derived features, while little work has been done to systematically assess the importance of structural features to glycosylation prediction. Here, we propose a novel bioinformatics method called GlycoMinestruct(http://glycomine.erc.monash.edu/Lab/GlycoMine_Struct/) for improved prediction of human N- and O-linked glycosylation sites by combining sequence and structural features in an integrated computational framework with a two-step feature-selection strategy. Experiments indicated that GlycoMinestruct outperformed NGlycPred, the only predictor that incorporated both sequence and structure features, achieving AUC values of 0.941 and 0.922 for N- and O-linked glycosylation, respectively, on an independent test dataset. We applied GlycoMinestruct to screen the human structural proteome and obtained high-confidence predictions for N- and O-linked glycosylation sites. GlycoMinestruct can be used as a powerful tool to expedite the discovery of glycosylation events and substrates to facilitate hypothesis-driven experimental studies.


Assuntos
Biologia Computacional , Glicoproteínas , Proteoma , Análise de Sequência de Proteína/métodos , Software , Glicoproteínas/química , Glicoproteínas/genética , Glicosilação , Humanos , Proteoma/química , Proteoma/genética
20.
Brief Bioinform ; 14(5): 563-74, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23543352

RESUMO

The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show-style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources.


Assuntos
Biologia Computacional/educação , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Austrália , Instrução por Computador/métodos , Comportamento Cooperativo , Currículo , Ensino
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...