Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Bioinformatics ; 38(17): 4206-4213, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35801909

RESUMO

MOTIVATION: The molecular subtyping of gastric cancer (adenocarcinoma) into four main subtypes based on integrated multiomics profiles, as proposed by The Cancer Genome Atlas (TCGA) initiative, represents an effective strategy for patient stratification. However, this approach requires the use of multiple technological platforms, and is quite expensive and time-consuming to perform. A computational approach that uses histopathological image data to infer molecular subtypes could be a practical, cost- and time-efficient complementary tool for prognostic and clinical management purposes. RESULTS: Here, we propose a deep learning ensemble approach (called DEMoS) capable of predicting the four recognized molecular subtypes of gastric cancer directly from histopathological images. DEMoS achieved tile-level area under the receiver-operating characteristic curve (AUROC) values of 0.785, 0.668, 0.762 and 0.811 for the prediction of these four subtypes of gastric cancer [i.e. (i) Epstein-Barr (EBV)-infected, (ii) microsatellite instability (MSI), (iii) genomically stable (GS) and (iv) chromosomally unstable tumors (CIN)] using an independent test dataset, respectively. At the patient-level, it achieved AUROC values of 0.897, 0.764, 0.890 and 0.898, respectively. Thus, these four subtypes are well-predicted by DEMoS. Benchmarking experiments further suggest that DEMoS is able to achieve an improved classification performance for image-based subtyping and prevent model overfitting. This study highlights the feasibility of using a deep learning ensemble-based method to rapidly and reliably subtype gastric cancer (adenocarcinoma) solely using features from histopathological images. AVAILABILITY AND IMPLEMENTATION: All whole slide images used in this study was collected from the TCGA database. This study builds upon our previously published HEAL framework, with related documentation and tutorials available at http://heal.erc.monash.edu.au. The source code and related models are freely accessible at https://github.com/Docurdt/DEMoS.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Adenocarcinoma , Aprendizado Profundo , Neoplasias Gástricas , Humanos , Neoplasias Gástricas/diagnóstico por imagem , Neoplasias Gástricas/genética , Adenocarcinoma/diagnóstico por imagem , Adenocarcinoma/genética , Instabilidade de Microssatélites
2.
NPJ Precis Oncol ; 6(1): 45, 2022 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-35739342

RESUMO

Gastric cancer is one of the deadliest cancers worldwide. An accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the multiplexed immunohistochemistry (mIHC) images as Cell-Graphs, we propose a graph neural network-based approach, termed Cell-Graph Signature or CGSignature, powered by artificial intelligence, for the digital staging of TME and precise prediction of patient survival in gastric cancer. In this study, patient survival prediction is formulated as either a binary (short-term and long-term) or ternary (short-term, medium-term, and long-term) classification task. Extensive benchmarking experiments demonstrate that the CGSignature achieves outstanding model performance, with Area Under the Receiver Operating Characteristic curve of 0.960 ± 0.01, and 0.771 ± 0.024 to 0.904 ± 0.012 for the binary- and ternary-classification, respectively. Moreover, Kaplan-Meier survival analysis indicates that the "digital grade" cancer staging produced by CGSignature provides a remarkable capability in discriminating both binary and ternary classes with statistical significance (P value < 0.0001), significantly outperforming the AJCC 8th edition Tumor Node Metastasis staging system. Using Cell-Graphs extracted from mIHC images, CGSignature improves the assessment of the link between the TME spatial patterns and patient prognosis. Our study suggests the feasibility and benefits of such an artificial intelligence-powered digital staging system in diagnostic pathology and precision oncology.

3.
Bioinformatics ; 37(21): 3986-3988, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34061168

RESUMO

MOTIVATION: Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patterns using the one-class learning strategy. RESULTS: We present a Python package, termed OCTID, which combines a pretrained convolutional neural network (CNN) model, Uniform Manifold Approximation and Projection (UMAP) and one-class support vector machine to achieve accurate tumor tile classification using a training set of tumor free tiles. Benchmarking experiments on four H&E image datasets achieved remarkable performance in terms of F1-score (0.90 ± 0.06), Matthews correlation coefficient (0.93 ± 0.05) and accuracy (0.94 ± 0.03). AVAILABILITY AND IMPLEMENTATION: Detailed information can be found in the Supplementary File. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias , Redes Neurais de Computação , Linguagens de Programação , Neoplasias/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Conjuntos de Dados como Assunto
4.
Bioinformatics ; 37(22): 4291-4295, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34009289

RESUMO

MOTIVATION: Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility. RESULTS: Here, we propose HEAL, a deep learning-based automated framework for easy, flexible and multi-faceted histopathological image analysis. We demonstrate its utility and functionality by performing two case studies on lung cancer and one on colon cancer. Leveraging the capability of Docker, HEAL represents an ideal end-to-end tool to conduct complex histopathological analysis and enables deep learning in a broad range of applications for cancer image analysis. AVAILABILITY AND IMPLEMENTATION: The docker image of HEAL is available at https://hub.docker.com/r/docurdt/heal and related documentation and datasets are available at http://heal.erc.monash.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias do Colo , Aprendizado Profundo , Humanos , Software , Reprodutibilidade dos Testes
5.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33454737

RESUMO

Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such binding. With the use of mass spectrometry, the scale of naturally presented HLA ligands that could be used to develop such predictors has been expanded. However, there are rarely efforts that focus on the integration of these experimental data with computational algorithms to efficiently develop up-to-date predictors. Here, we present Anthem for accurate HLA-I binding prediction. In particular, we have developed a user-friendly framework to support the development of customisable HLA-I binding prediction models to meet challenges associated with the rapidly increasing availability of large amounts of immunopeptidomic data. Our extensive evaluation, using both independent and experimental datasets shows that Anthem achieves an overall similar or higher area under curve value compared with other contemporary tools. It is anticipated that Anthem will provide a unique opportunity for the non-expert user to analyse and interpret their own in-house or publicly deposited datasets.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Epitopos , Antígenos de Histocompatibilidade Classe I , Peptídeos , Software , Epitopos/química , Epitopos/imunologia , Antígenos de Histocompatibilidade Classe I/química , Antígenos de Histocompatibilidade Classe I/imunologia , Humanos , Imunoterapia , Neoplasias/imunologia , Neoplasias/terapia , Peptídeos/química , Peptídeos/imunologia
6.
BMC Bioinformatics ; 20(1): 602, 2019 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-31752668

RESUMO

BACKGROUND: S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (-SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation. RESULTS: In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods. CONCLUSIONS: In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteoma/metabolismo , Sulfamerazina/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Área Sob a Curva , Sequência Conservada , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Redes Neurais de Computação , Curva ROC , Software
7.
Bioinformatics ; 34(14): 2499-2502, 2018 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-29528364

RESUMO

Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. Availability and implementation: http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Anotação de Sequência Molecular , Peptídeos/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Software , Aprendizado de Máquina , Peptídeos/química , Peptídeos/fisiologia , Conformação Proteica , Proteínas/química , Proteínas/fisiologia
8.
JCO Clin Cancer Inform ; 1: 1-10, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-30657390

RESUMO

PURPOSE: Prospective epidemiologic surveillance of invasive mold disease (IMD) in hematology patients is hampered by the absence of a reliable laboratory prompt. This study develops an expert system for electronic surveillance of IMD that combines probabilities using natural language processing (NLP) of computed tomography (CT) reports with microbiology and antifungal drug data to improve prediction of IMD. METHODS: Microbiology indicators and antifungal drug-dispensing data were extracted from hospital information systems at three tertiary hospitals for 123 hematology-oncology patients. Of this group, 64 case patients had 26 probable/proven IMD according to international definitions, and 59 patients were uninfected controls. Derived probabilities from NLP combined with medical expertise identified patients at high likelihood of IMD, with remaining patients processed by a machine-learning classifier trained on all available features. RESULTS: Compared with the baseline text classifier, the expert system that incorporated the best performing algorithm (naïve Bayes) improved specificity from 50.8% (95% CI, 37.5% to 64.1%) to 74.6% (95% CI, 61.6% to 85.0%), reducing false positives by 48% from 29 to 15; improved sensitivity slightly from 96.9% (95% CI, 89.2% to 99.6%) to 98.4% (95% CI, 91.6% to 100%); and improved receiver operating characteristic area from 73.9% (95% CI, 67.1% to 80.6%) to 92.8% (95% CI, 88% to 97.5%). CONCLUSION: An expert system that uses multiple sources of data (CT reports, microbiology, antifungal drug dispensing) is a promising approach to continuous prospective surveillance of IMD in the hospital, and demonstrates reduced false notifications (positives) compared with NLP of CT reports alone. Our expert system could provide decision support for IMD surveillance, which is critical to antifungal stewardship and improving supportive care in cancer.


Assuntos
Infecções Fúngicas Invasivas/diagnóstico , Infecções Fúngicas Invasivas/terapia , Oncologia , Monitorização Fisiológica/métodos , Neoplasias/diagnóstico , Neoplasias/terapia , Telemedicina/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Antifúngicos/uso terapêutico , Estudos de Casos e Controles , Terapia Combinada , Registros Eletrônicos de Saúde , Sistemas Inteligentes , Feminino , Humanos , Infecções Fúngicas Invasivas/etiologia , Aprendizado de Máquina , Masculino , Oncologia/métodos , Técnicas Microbiológicas , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Neoplasias/complicações , Curva ROC , Sensibilidade e Especificidade , Tomografia Computadorizada por Raios X , Adulto Jovem
9.
PLoS One ; 7(11): e50300, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23209700

RESUMO

The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.


Assuntos
Peptídeo Hidrolases/química , Proteínas/química , Algoritmos , Animais , Inteligência Artificial , Catálise , Bovinos , Biologia Computacional/métodos , Granzimas/química , Humanos , Hidrólise , Camundongos , Modelos Estatísticos , Peptídeos/química , Ligação Proteica , Conformação Proteica , Processamento de Proteína Pós-Traducional , Curva ROC , Software , Solventes/química , Especificidade por Substrato
10.
Bioinformatics ; 26(6): 752-60, 2010 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-20130033

RESUMO

MOTIVATION: The caspase family of cysteine proteases play essential roles in key biological processes such as programmed cell death, differentiation, proliferation, necrosis and inflammation. The complete repertoire of caspase substrates remains to be fully characterized. Accordingly, systematic computational screening studies of caspase substrate cleavage sites may provide insight into the substrate specificity of caspases and further facilitating the discovery of putative novel substrates. RESULTS: In this article we develop an approach (termed Cascleave) to predict both classical (i.e. following a P(1) Asp) and non-typical caspase cleavage sites. When using local sequence-derived profiles, Cascleave successfully predicted 82.2% of the known substrate cleavage sites, with a Matthews correlation coefficient (MCC) of 0.667. We found that prediction performance could be further improved by incorporating information such as predicted solvent accessibility and whether a cleavage sequence lies in a region that is most likely natively unstructured. Novel bi-profile Bayesian signatures were found to significantly improve the prediction performance and yielded the best performance with an overall accuracy of 87.6% and a MCC of 0.747, which is higher accuracy than published methods that essentially rely on amino acid sequence alone. It is anticipated that Cascleave will be a powerful tool for predicting novel substrate cleavage sites of caspases and shedding new insights on the unknown caspase-substrate interactivity relationship. AVAILABILITY: http://sunflower.kuicr.kyoto-u.ac.jp/ approximately sjn/Cascleave/ CONTACT: jiangning.song@med.monash.edu.au; takutsu@kuicr.kyoto-u.ac.jp; james; whisstock@med.monash.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Caspases/química , Proteômica/métodos , Software , Caspases/metabolismo , Bases de Dados de Proteínas , Especificidade por Substrato
11.
Genome Res ; 17(7): 1118-27, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17567984

RESUMO

Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism's transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.


Assuntos
Códon/genética , Evolução Molecular , Proteínas/genética , Sequências Repetitivas de Aminoácidos/genética , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Humanos , Peptídeos/química , Peptídeos/genética , Polimorfismo de Nucleotídeo Único , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA