Pesquisa | Biblioteca Virtual em Saúde

1.

Predicting Molecular Subtype and Survival of Rhabdomyosarcoma Patients Using Deep Learning of H&E Images: A Report from the Children's Oncology Group.

Milewski, David; Jung, Hyun; Brown, G Thomas; Liu, Yanling; Somerville, Ben; Lisle, Curtis; Ladanyi, Marc; Rudzinski, Erin R; Choo-Wosoba, Hyoyoung; Barkauskas, Donald A; Lo, Tammy; Hall, David; Linardic, Corinne M; Wei, Jun S; Chou, Hsien-Chao; Skapek, Stephen X; Venkatramani, Rajkumar; Bode, Peter K; Steinberg, Seth M; Zaki, George; Kuznetsov, Igor B; Hawkins, Douglas S; Shern, Jack F; Collins, Jack; Khan, Javed.

Clin Cancer Res ; 29(2): 364-378, 2023 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-36346688

RESUMO

PURPOSE: Rhabdomyosarcoma (RMS) is an aggressive soft-tissue sarcoma, which primarily occurs in children and young adults. We previously reported specific genomic alterations in RMS, which strongly correlated with survival; however, predicting these mutations or high-risk disease at diagnosis remains a significant challenge. In this study, we utilized convolutional neural networks (CNN) to learn histologic features associated with driver mutations and outcome using hematoxylin and eosin (H&E) images of RMS. EXPERIMENTAL DESIGN: Digital whole slide H&E images were collected from clinically annotated diagnostic tumor samples from 321 patients with RMS enrolled in Children's Oncology Group (COG) trials (1998-2017). Patches were extracted and fed into deep learning CNNs to learn features associated with mutations and relative event-free survival risk. The performance of the trained models was evaluated against independent test sample data (n = 136) or holdout test data. RESULTS: The trained CNN could accurately classify alveolar RMS, a high-risk subtype associated with PAX3/7-FOXO1 fusion genes, with an ROC of 0.85 on an independent test dataset. CNN models trained on mutationally-annotated samples identified tumors with RAS pathway with a ROC of 0.67, and high-risk mutations in MYOD1 or TP53 with a ROC of 0.97 and 0.63, respectively. Remarkably, CNN models were superior in predicting event-free and overall survival compared with current molecular-clinical risk stratification. CONCLUSIONS: This study demonstrates that high-risk features, including those associated with certain mutations, can be readily identified at diagnosis using deep learning. CNNs are a powerful tool for diagnostic and prognostic prediction of rhabdomyosarcoma, which will be tested in prospective COG clinical trials.

Assuntos

Aprendizado Profundo , Rabdomiossarcoma Alveolar , Rabdomiossarcoma , Criança , Humanos , Adulto Jovem , Amarelo de Eosina-(YS) , Hematoxilina , Fatores de Transcrição Box Pareados/genética , Estudos Prospectivos , Rabdomiossarcoma/diagnóstico , Rabdomiossarcoma/genética , Rabdomiossarcoma Alveolar/genética

2.

Immuno-transcriptomic profiling of extracranial pediatric solid malignancies.

Brohl, Andrew S; Sindiri, Sivasish; Wei, Jun S; Milewski, David; Chou, Hsien-Chao; Song, Young K; Wen, Xinyu; Kumar, Jeetendra; Reardon, Hue V; Mudunuri, Uma S; Collins, Jack R; Nagaraj, Sushma; Gangalapudi, Vineela; Tyagi, Manoj; Zhu, Yuelin J; Masih, Katherine E; Yohe, Marielle E; Shern, Jack F; Qi, Yue; Guha, Udayan; Catchpoole, Daniel; Orentas, Rimas J; Kuznetsov, Igor B; Llosa, Nicolas J; Ligon, John A; Turpin, Brian K; Leino, Daniel G; Iwata, Shintaro; Andrulis, Irene L; Wunder, Jay S; Toledo, Silvia R C; Meltzer, Paul S; Lau, Ching; Teicher, Beverly A; Magnan, Heather; Ladanyi, Marc; Khan, Javed.

Cell Rep ; 37(8): 110047, 2021 11 23.

Artigo em Inglês | MEDLINE | ID: mdl-34818552

RESUMO

We perform an immunogenomics analysis utilizing whole-transcriptome sequencing of 657 pediatric extracranial solid cancer samples representing 14 diagnoses, and additionally utilize transcriptomes of 131 pediatric cancer cell lines and 147 normal tissue samples for comparison. We describe patterns of infiltrating immune cells, T cell receptor (TCR) clonal expansion, and translationally relevant immune checkpoints. We find that tumor-infiltrating lymphocytes and TCR counts vary widely across cancer types and within each diagnosis, and notably are significantly predictive of survival in osteosarcoma patients. We identify potential cancer-specific immunotherapeutic targets for adoptive cell therapies including cell-surface proteins, tumor germline antigens, and lineage-specific transcription factors. Using an orthogonal immunopeptidomics approach, we find several potential immunotherapeutic targets in osteosarcoma and Ewing sarcoma and validated PRAME as a bona fide multi-pediatric cancer target. Importantly, this work provides a critical framework for immune targeting of extracranial solid tumors using parallel immuno-transcriptomic and -peptidomic approaches.

Assuntos

Neoplasias/genética , Neoplasias/imunologia , Transcriptoma/genética , Adolescente , Antígenos de Neoplasias , Linhagem Celular Tumoral , Criança , Pré-Escolar , Feminino , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Humanos , Proteínas de Checkpoint Imunológico/genética , Proteínas de Checkpoint Imunológico/imunologia , Imunogenética/métodos , Imunoterapia Adotiva , Lactente , Linfócitos do Interstício Tumoral/imunologia , Masculino , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/imunologia , Transcriptoma/imunologia , Microambiente Tumoral , Sequenciamento do Exoma/métodos

3.

Outcome-Related Signatures Identified by Whole Transcriptome Sequencing of Resectable Stage III/IV Melanoma Evaluated after Starting Hu14.18-IL2.

Yang, Richard K; Kuznetsov, Igor B; Ranheim, Erik A; Wei, Jun S; Sindiri, Sivasish; Gryder, Berkley E; Gangalapudi, Vineela; Song, Young K; Patel, Viharkumar; Hank, Jacquelyn A; Zuleger, Cindy; Erbe, Amy K; Morris, Zachary S; Quale, Renae; Kim, KyungMann; Albertini, Mark R; Khan, Javed; Sondel, Paul M.

Clin Cancer Res ; 26(13): 3296-3306, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32152202

RESUMO

PURPOSE: We analyzed whole transcriptome sequencing in tumors from 23 patients with stage III or IV melanoma from a pilot trial of the anti-GD2 immunocytokine, hu14.18-IL2, to identify predictive immune and/or tumor biomarkers in patients with melanoma at high risk for recurrence. EXPERIMENTAL DESIGN: Patients were randomly assigned to receive the first of three monthly courses of hu14.18-IL2 immunotherapy either before (Group A) or after (Group B) complete surgical resection of all known diseases. Tumors were evaluated by histology and whole transcriptome sequencing. RESULTS: Tumor-infiltrating lymphocyte (TIL) levels directly associated with relapse-free survival (RFS) and overall survival (OS) in resected tumors from Group A, where early responses to the immunotherapy agent could be assessed. TIL levels directly associated with a previously reported immune signature, which associated with RFS and OS, particularly in Group A tumors. In Group A tumors, there were decreased cell-cycling gene RNA transcripts, but increased RNA transcripts for repair and growth genes. We found that outcome (RFS and OS) was directly associated with several immune signatures and immune-related RNA transcripts and inversely associated with several tumor growth-associated transcripts, particularly in Group A tumors. Most of these associations were not seen in Group B tumors. CONCLUSIONS: We interpret these data to signify that both immunologic and tumoral cell processes, as measured by RNA-sequencing analyses detected shortly after initiation of hu14.18-IL2 therapy, are associated with long-term survival and could potentially be used as prognostic biomarkers in tumor resection specimens obtained after initiating neoadjuvant immunotherapy.

Assuntos

Biomarcadores Tumorais , Melanoma/genética , Melanoma/mortalidade , Anticorpos Monoclonais/administração & dosagem , Anticorpos Monoclonais/efeitos adversos , Anticorpos Monoclonais/uso terapêutico , Biologia Computacional/métodos , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Interleucina-2/administração & dosagem , Interleucina-2/efeitos adversos , Interleucina-2/uso terapêutico , Estimativa de Kaplan-Meier , Linfócitos do Interstício Tumoral/imunologia , Linfócitos do Interstício Tumoral/metabolismo , Masculino , Melanoma/patologia , Melanoma/terapia , Estadiamento de Neoplasias , Prognóstico , Modelos de Riscos Proporcionais , Transcriptoma , Resultado do Tratamento , Sequenciamento do Exoma

4.

Clinically Relevant Cytotoxic Immune Cell Signatures and Clonal Expansion of T-Cell Receptors in High-Risk MYCN-Not-Amplified Human Neuroblastoma.

Wei, Jun S; Kuznetsov, Igor B; Zhang, Shile; Song, Young K; Asgharzadeh, Shahab; Sindiri, Sivasish; Wen, Xinyu; Patidar, Rajesh; Najaraj, Sushma; Walton, Ashley; Auvil, Jaime M Guidry; Gerhard, Daniela S; Yuksel, Aysen; Catchpoole, Daniel; Hewitt, Stephen M; Sondel, Paul M; Seeger, Robert; Maris, John M; Khan, Javed.

Clin Cancer Res ; 24(22): 5673-5684, 2018 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-29784674

RESUMO

Purpose: High-risk neuroblastoma is an aggressive disease. DNA sequencing studies have revealed a paucity of actionable genomic alterations and a low mutation burden, posing challenges to develop effective novel therapies. We used RNA sequencing (RNA-seq) to investigate the biology of this disease, including a focus on tumor-infiltrating lymphocytes (TIL).Experimental Design: We performed deep RNA-seq on pretreatment diagnostic tumors from 129 high-risk and 21 low- or intermediate-risk patients with neuroblastomas. We used single-sample gene set enrichment analysis to detect gene expression signatures of TILs in tumors and examined their association with clinical and molecular parameters, including patient outcome. The expression profiles of 190 additional pretreatment diagnostic neuroblastomas, a neuroblastoma tissue microarray, and T-cell receptor (TCR) sequencing were used to validate our findings.Results: We found that MYCN-not-amplified (MYCN-NA) tumors had significantly higher cytotoxic TIL signatures compared with MYCN-amplified (MYCN-A) tumors. A reported MYCN activation signature was significantly associated with poor outcome for high-risk patients with MYCN-NA tumors; however, a subgroup of these patients who had elevated activated natural killer (NK) cells, CD8+ T cells, and cytolytic signatures showed improved outcome and expansion of infiltrating TCR clones. Furthermore, we observed upregulation of immune exhaustion marker genes, indicating an immune-suppressive microenvironment in these neuroblastomas.Conclusions: This study provides evidence that RNA signatures of cytotoxic TIL are associated with the presence of activated NK/T cells and improved outcomes in high-risk neuroblastoma patients harboring MYCN-NA tumors. Our findings suggest that these high-risk patients with MYCN-NA neuroblastoma may benefit from additional immunotherapies incorporated into the current therapeutic strategies. Clin Cancer Res; 24(22); 5673-84. ©2018 AACR.

Assuntos

Citotoxicidade Imunológica/genética , Proteína Proto-Oncogênica N-Myc/genética , Neuroblastoma/genética , Neuroblastoma/imunologia , Receptores de Antígenos de Linfócitos T/genética , Linfócitos T/imunologia , Linfócitos T/metabolismo , Linhagem Celular Tumoral , Pré-Escolar , Biologia Computacional/métodos , Amplificação de Genes , Regulação Neoplásica da Expressão Gênica , Humanos , Lactente , Recém-Nascido , Estadiamento de Neoplasias , Neuroblastoma/patologia , Transcriptoma

5.

Identification of non-random sequence properties in groups of signature peptides obtained in random sequence peptide microarray experiments.

Kuznetsov, Igor B.

Biopolymers ; 106(3): 318-29, 2016 May.

Artigo em Inglês | MEDLINE | ID: mdl-27037995

RESUMO

Immunosignaturing is an emerging experimental technique that uses random sequence peptide microarrays to detect antibodies produced by the immune system in response to a particular disease. Two important questions regarding immunosignaturing are "Do microarray peptides that exhibit a strong affinity to a given type of antibodies share common sequence properties?" and "If so, what are those properties?" In this work, three statistical tests designed to detect non-random patterns in the amino acid makeup of a group of microarray peptides are presented. One test detects patterns of significantly biased amino acid usage, whereas the other two detect patterns of significant bias in the biochemical properties. These tests do not require a large number of peptides per group. The tests were applied to analyze 19 groups of peptides identified in immunosignaturing experiments as being specific for antibodies produced in response to various types of cancer and other diseases. The positional distribution of the biochemical properties of the amino acids in these 19 peptide groups was also studied. Remarkably, despite the random nature of the sequence libraries used to design the microarrays, a unique group-specific non-random pattern was identified in the majority of the peptide groups studied. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 318-329, 2016.

Assuntos

Anticorpos/análise , Modelos Estatísticos , Neoplasias/diagnóstico , Neoplasias/imunologia , Biblioteca de Peptídeos , Motivos de Aminoácidos , Afinidade de Anticorpos , Humanos , Imunoensaio/instrumentação , Imunoensaio/estatística & dados numéricos , Neoplasias/classificação , Neoplasias/genética , Análise Serial de Proteínas , Ligação Proteica

6.

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

Kuznetsov, Igor B; McDuffie, Michael.

BMC Res Notes ; 8: 187, 2015 May 07.

Artigo em Inglês | MEDLINE | ID: mdl-25947299

RESUMO

BACKGROUND: Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities. FINDINGS: PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach. CONCLUSIONS: PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors' knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.

Assuntos

Algoritmos , Aminoácidos/química , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Sequência de Aminoácidos , Substituição de Aminoácidos , Biologia Computacional/instrumentação , Dados de Sequência Molecular , Alinhamento de Sequência/estatística & dados numéricos

7.

Protein sequence alignment with family-specific amino acid similarity matrices.

Kuznetsov, Igor B.

BMC Res Notes ; 4: 296, 2011 Aug 16.

Artigo em Inglês | MEDLINE | ID: mdl-21846354

RESUMO

BACKGROUND: Alignment of amino acid sequences by means of dynamic programming is a cornerstone sequence comparison method. The quality of alignments produced by dynamic programming critically depends on the choice of the alignment scoring function. Therefore, for a specific alignment problem one needs a way of selecting the best performing scoring function. This work is focused on the issue of finding optimized protein family- and fold-specific scoring functions for global similarity matrix-based sequence alignment. FINDINGS: I utilize a comprehensive set of reference alignments obtained from structural superposition of homologous and analogous proteins to design a quantitative statistical framework for evaluating the performance of alignment scoring functions in global pairwise sequence alignment. This framework is applied to study how existing general-purpose amino acid similarity matrices perform on individual protein families and structural folds, and to compare them to family-specific and fold-specific matrices derived in this work. I describe an adaptive alignment procedure that automatically selects an appropriate similarity matrix and optimized gap penalties based on the properties of the sequences being aligned. CONCLUSIONS: The results of this work indicate that using family-specific similarity matrices significantly improves the quality of the alignment of homologous sequences over the traditional sequence alignment based on a single general-purpose similarity matrix. However, using fold-specific similarity matrices can only marginally improve sequence alignment of proteins that share the same structural fold but do not share a common evolutionary origin. The family-specific matrices derived in this work and the optimized gap penalties are available at http://taurus.crc.albany.edu/fsm.

8.

Simplified computational methods for the analysis of protein flexibility.

Kuznetsov, Igor B.

Curr Protein Pept Sci ; 10(6): 607-13, 2009 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-19538139

RESUMO

Conformational flexibility is an inherent property of the protein structure. Large scale changes in the protein conformation play a key role in a variety of fundamental biological activities and have been implicated in a number of diseases. The time scales of functionally relevant dynamic processes in proteins generally do not allow the researchers to study them by the means of detailed atomic level simulations. Therefore, less computationally demanding methods based on the coarse grained models of protein structure and bioinformatics approaches are particularly important for the flexibility-related studies. This review is focused on two broad categories of protein flexibility - protein disorder and conformational switches. In the case of protein disorder, a flexible protein segment or entire protein is structurally disordered, meaning that it does not have a well-defined folded 3D structure. In the case of conformational switches, the protein backbone of a flexible segment can change or "switch" from one specific folded 3D conformation to another. In this review, the relative strengths and limitations of the existing computational tools, mostly from the bioinformatics domain, used to study and predict protein disorder and conformational switches will be discussed and the main challenges will be highlighted.

Assuntos

Biologia Computacional/métodos , Conformação Proteica , Dobramento de Proteína , Proteínas/química , Cristalografia por Raios X , Genômica/métodos , Espectroscopia de Ressonância Magnética , Estrutura Terciária de Proteína , Proteínas/genética , Proteômica/métodos

9.

A web server for inferring the human N-acetyltransferase-2 (NAT2) enzymatic phenotype from NAT2 genotype.

Kuznetsov, Igor B; McDuffie, Michael; Moslehi, Roxana.

Bioinformatics ; 25(9): 1185-6, 2009 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-19261719

RESUMO

UNLABELLED: N-acetyltransferase-2 (NAT2) is an important enzyme that catalyzes the acetylation of aromatic and heterocyclic amine carcinogens. Individuals in human populations are divided into three NAT2 acetylator phenotypes: slow, rapid and intermediate. NAT2PRED is a web server that implements a supervised pattern recognition method to infer NAT2 phenotype from SNPs found in NAT2 gene positions 282, 341, 481, 590, 803 and 857. The web server can be used for a fast determination of NAT2 phenotypes in genetic screens. AVAILABILITY: Freely available at http://nat2pred.rit.albany.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Arilamina N-Acetiltransferase/genética , Polimorfismo de Nucleotídeo Único , Software , Arilamina N-Acetiltransferase/classificação , Arilamina N-Acetiltransferase/metabolismo , Genótipo , Humanos , Internet , Fenótipo , Especificidade por Substrato

10.

CFP: a web-server for constructing sequence-based protein conformational flexibility profiles.

Kuznetsov, Igor B; Rackovsky, Shalom.

Bioinformation ; 4(5): 176-8, 2009 Oct 19.

Artigo em Inglês | MEDLINE | ID: mdl-20461153

RESUMO

UNLABELLED: Many proteins contain conformationally flexible segments that undergo significant changes in the backbone conformation or completely lack a well-defined conformation. Previously, we have developed the generalized local propensity (GLP), a quantitative sequence-based measure of the protein backbone flexibility. In this paper, we present the CFP (Conformational Flexibility Profile) web-server that constructs the GLP flexibility profile for a user-submitted sequence and uses this profile to identify segments with high backbone flexibility. The statistical significance of a flexible sequence segment is assessed using the discrete scan statistics based on the density of flexible residues observed in this segment. AVAILABILITY: CFP is publicly available at http://cfp.rit.albany.edu.

11.

ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences.

Kuznetsov, Igor B.

Bioinformatics ; 24(13): 1534-5, 2008 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-18480099

RESUMO

UNLABELLED: Most proteins contain compositionally biased segments (CBS) in which one or more amino acid types are significantly overrepresented. CBS that contain amino acids with similar chemical properties can have functional and structural importance. This article describes ProBias, a web-server that searches a protein sequence for CBS composed of user-specified amino acid types. ProBias utilizes the discrete scan statistics to estimate statistical significance of CBS and is able to detect even subtle local deviations from the random independence model. The web-server also analyzes the global compositional bias of the input sequence. In the case of novel proteins that lack functional annotation, statistically significant CBS reported by ProBias can be used to guide the search for potential functionally important sites or domains. AVAILABILITY: Freely available at http://lcg.rit.albany.edu/ProBias. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Internet , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Dados de Sequência Molecular , Estrutura Terciária de Proteína

12.

Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data.

Kuznetsov, Igor B.

Proteins ; 72(1): 74-87, 2008 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-18186479

RESUMO

Ordered conformational changes are an important structural property of proteins and are involved in a variety of fundamental biological activities. Large-scale analyses of the implications of such changes for protein function and dysfunction require efficient methods for automated recognition of conformationally variable residue positions. The goal of this work was to study sequence and low-resolution structural properties of residue positions that change backbone conformation upon changes in protein environment and the utility of these properties for automated recognition of such conformationally variable positions. This study was performed using a large nonredundant set of experimentally characterized proteins that undergo ordered conformational transitions obtained from the Database of Macromolecular Movements. The results of this study show that ordered changes in backbone conformation are not limited to solvent accessible loop regions. A considerable fraction of conformationally variable positions is observed in helices and strands, and in buried positions. Conformationally variable positions are less conserved in evolution. Local patterns of (a) sequence neighbors, (b) evolutionary conservation, and (c) solvent accessibility can be used to predict conformationally variable positions with balanced sensitivity and specificity, albeit with large variance at the level of individual proteins. However, including a pattern of secondary structure into the prediction scheme results in a highly unbalanced performance when all conformationally variable positions located in regular secondary structure are misclassified. Application of the present methodology to the prion protein (PrP) shows that conformationally variable positions predicted in its ordered C-terminal domain are located within segments presumed to be involved in refolding of PrP.

Assuntos

Proteínas/química , Sequência de Aminoácidos , Sequência Conservada , Evolução Molecular , Humanos , Dados de Sequência Molecular , Príons/química , Estrutura Secundária de Proteína , Curva ROC , Solventes

13.

FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins.

Kuznetsov, Igor B; McDuffie, Michael.

Bioinformation ; 3(3): 134-6, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-19238251

RESUMO

UNLABELLED: Conformational switches observed in the protein backbone play a key role in a variety of fundamental biological activities. This paper describes a web-server that implements a pattern recognition algorithm trained on the examples from the Database of Macromolecular Movements to predict residue positions involved in conformational switches. Prediction can be performed at an adjustable false positive rate using a user-supplied protein sequence in FASTA format or a structure in a Protein Data Bank (PDB) file. If a protein sequence is submitted, then the web-server uses sequence-derived information only (such as evolutionary conservation of residue positions). If a PDB file is submitted, then the web-server uses sequence-derived information and residue solvent accessibility calculated from this file. AVAILABILITY: FlexPred is publicly available at http://flexpred.rit.albany.edu.

14.

On the Accuracy of Sequence-Based Computational Inference of Protein Residues Involved in Interactions with DNA.

Gou, Zhenkun; Kuznetsov, Igor B.

Trends Appl Sci Res ; 3(4): 285-291, 2008 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-20209034

RESUMO

Methods for computational inference of DNA-binding residues in DNA-binding proteins are usually developed using classification techniques trained to distinguish between binding and non-binding residues on the basis of known examples observed in experimentally determined high-resolution structures of protein-DNA complexes. What degree of accuracy can be expected when a computational methods is applied to a particular novel protein remains largely unknown. We test the utility of classification methods on the example of Kernel Logistic Regression (KLR) predictors of DNA-binding residues. We show that predictors that utilize sequence properties of proteins can successfully predict DNA-binding residues in proteins from a novel structural class. We use Multiple Linear Regression (MLR) to establish a quantitative relationship between protein properties and the expected accuracy of KLR predictors. Present results indicate that in the case of novel proteins the expected accuracy provided by an MLR model is close to the actual accuracy and can be used to assess the overall confidence of the prediction.

15.

DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins.

Hwang, Seungwoo; Gou, Zhenkun; Kuznetsov, Igor B.

Bioinformatics ; 23(5): 634-6, 2007 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-17237068

RESUMO

UNLABELLED: This article describes DP-Bind, a web server for predicting DNA-binding sites in a DNA-binding protein from its amino acid sequence. The web server implements three machine learning methods: support vector machine, kernel logistic regression and penalized logistic regression. Prediction can be performed using either the input sequence alone or an automatically generated profile of evolutionary conservation of the input sequence in the form of PSI-BLAST position-specific scoring matrix (PSSM). PSSM-based kernel logistic regression achieves the accuracy of 77.2%, sensitivity of 76.4% and specificity of 76.6%. The outputs of all three individual methods are combined into a consensus prediction to help identify positions predicted with high level of confidence. AVAILABILITY: Freely available at http://lcg.rit.albany.edu/dp-bind. SUPPLEMENTARY INFORMATION: http://lcg.rit.albany.edu/dp-bind/dpbind_supplement.html.

Assuntos

Inteligência Artificial , Proteínas de Ligação a DNA/química , Análise de Sequência de Proteína/métodos , Software , Sequência de Bases , Sítios de Ligação , Biologia Computacional , Computadores , Proteínas de Ligação a DNA/metabolismo , Internet , Modelos Logísticos

16.

Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.

Kuznetsov, Igor B; Gou, Zhenkun; Li, Run; Hwang, Seungwoo.

Proteins ; 64(1): 19-27, 2006 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-16568445

RESUMO

Proteins that interact with DNA are involved in a number of fundamental biological activities such as DNA replication, transcription, and repair. A reliable identification of DNA-binding sites in DNA-binding proteins is important for functional annotation, site-directed mutagenesis, and modeling protein-DNA interactions. We apply Support Vector Machine (SVM), a supervised pattern recognition method, to predict DNA-binding sites in DNA-binding proteins using the following features: amino acid sequence, profile of evolutionary conservation of sequence positions, and low-resolution structural information. We use a rigorous statistical approach to study the performance of predictors that utilize different combinations of features and how this performance is affected by structural and sequence properties of proteins. Our results indicate that an SVM predictor based on a properly scaled profile of evolutionary conservation in the form of a position specific scoring matrix (PSSM) significantly outperforms a PSSM-based neural network predictor. The highest accuracy is achieved by SVM predictor that combines the profile of evolutionary conservation with low-resolution structural information. Our results also show that knowledge-based predictors of DNA-binding sites perform significantly better on proteins from mainly-alpha structural class and that the performance of these predictors is significantly correlated with certain structural and sequence properties of proteins. These observations suggest that it may be possible to assign a reliability index to the overall accuracy of the prediction of DNA-binding sites in any given protein using its sequence and structural properties. A web-server implementation of the predictors is freely available online at http://lcg.rit.albany.edu/dp-bind/.

Assuntos

Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , DNA/química , DNA/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Evolução Molecular , Modelos Moleculares , Modelos Teóricos , Curva ROC

17.

A novel sensitive method for the detection of user-defined compositional bias in biological sequences.

Kuznetsov, Igor B; Hwang, Seungwoo.

Bioinformatics ; 22(9): 1055-63, 2006 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-16500936

RESUMO

MOTIVATION: Most biological sequences contain compositionally biased segments in which one or more residue types are significantly overrepresented. The function and evolution of these segments are poorly understood. Usually, all types of compositionally biased segments are masked and ignored during sequence analysis. However, it has been shown for a number of proteins that biased segments that contain amino acids with similar chemical properties are involved in a variety of molecular functions and human diseases. A detailed large-scale analysis of the functional implications and evolutionary conservation of different compositionally biased segments requires a sensitive method capable of detecting user-specified types of compositional bias. RESULTS: We present BIAS, a novel sensitive method for the detection of compositionally biased segments composed of a user-specified set of residue types. BIAS uses the discrete scan statistics that provides a highly accurate correction for multiple tests to compute analytical estimates of the significance of each compositionally biased segment. The method can take into account global compositional bias when computing analytical estimates of the significance of local clusters. BIAS is benchmarked against SEG, SAPS and CAST programs. We also use BIAS to show that groups of proteins with the same biological function are significantly associated with particular types of compositionally biased segments.

Assuntos

Algoritmos , Inteligência Artificial , Alinhamento de Sequência/métodos , Análise de Sequência/métodos , Viés , Sequência Conservada , Interpretação Estatística de Dados , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-Computador

18.

Comparative computational analysis of prion proteins reveals two fragments with unusual structural properties and a pattern of increase in hydrophobicity associated with disease-promoting mutations.

Kuznetsov, Igor B; Rackovsky, Shalom.

Protein Sci ; 13(12): 3230-44, 2004 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-15557265

RESUMO

Prion diseases are a group of neurodegenerative disorders associated with conversion of a normal prion protein, PrPC, into a pathogenic conformation, PrPSc. The PrPSc is thought to promote the conversion of PrPC. The structure and stability of PrPC are well characterized, whereas little is known about the structure of PrPSc, what parts of PrPC undergo conformational transition, or how mutations facilitate this transition. We use a computational knowledge-based approach to analyze the intrinsic structural propensities of the C-terminal domain of PrP and gain insights into possible mechanisms of structural conversion. We compare the properties of PrP sequences to those of a PrP paralog, Doppel, and to the distributions of structural propensities observed in known protein structures from the Protein Data Bank. We show that the prion protein contains at least two sequence fragments with highly unusual intrinsic propensities, PrP(114-125) and helix B. No segments with unusual properties were found in Doppel protein, which is topologically identical to PrP but does not undergo structural rearrangements. Known disease-promoting PrP mutations form a statistically significant cluster in the region comprising helices B and C. Due to their unusual properties, PrP(114-125) and the C terminus of helix B may be considered as primary candidates for sites involved in conformational transition from PrPC to PrPSc. The results of our study also show that most PrP mutations associated with neurodegenerative disorders increase local hydrophobicity. We suggest that the observed increase in hydrophobicity may facilitate PrP-to-PrP or/and PrP-to-cofactor interactions, and thus promote structural conversion.

Assuntos

Biologia Computacional , Mutação , Fragmentos de Peptídeos/química , Doenças Priônicas/genética , Príons/química , Príons/genética , Sequência de Aminoácidos , Animais , Bovinos , Proteínas Ligadas por GPI , Humanos , Interações Hidrofóbicas e Hidrofílicas , Camundongos , Dados de Sequência Molecular , Fragmentos de Peptídeos/fisiologia , Estrutura Secundária de Proteína , Alinhamento de Sequência

19.

Class-specific correlations between protein folding rate, structure-derived, and sequence-derived descriptors.

Kuznetsov, Igor B; Rackovsky, Shalom.

Proteins ; 54(2): 333-41, 2004 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-14696195

RESUMO

Small single-domain proteins that fold by simple two-state kinetics have been shown to exhibit a wide variation in their folding rates. It has been proposed that folding mechanisms in these proteins are largely determined by the native-state topology, and a significant correlation between folding rate and measures of the average topological complexity, such as relative contact order (RCO), has been reported. We perform a statistical analysis of folding rate and RCO in all three major structural classes (alpha, beta, and alpha/beta) of small two-state proteins and of RCO in groups of analogous and homologous small single-domain proteins with the same topology. We also study correlation between folding rate and the average physicochemical properties of amino acid sequences in two-state proteins. Our results indicate that 1) helical proteins have statistically distinguishable, class-specific folding rates; 2) RCO accounts for essentially all the variation of folding rate in helical proteins, but for only a part of the variation in beta-sheet-containing proteins; and 3) only a small fraction of the protein topologies studied show a topology-specific RCO. We also report a highly significant correlation between the folding rate and average intrinsic structural propensities of protein sequences. These results suggest that intrinsic structural propensities may be an important determinant of the rate of folding in small two-state proteins.

Assuntos

Dobramento de Proteína , Proteínas/química , Proteínas/classificação , Bases de Dados de Proteínas , Cinética , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/metabolismo

20.

On the properties and sequence context of structurally ambivalent fragments in proteins.

Kuznetsov, Igor B; Rackovsky, S.

Protein Sci ; 12(11): 2420-33, 2003 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-14573856

RESUMO

The goal of this work is to characterize structurally ambivalent fragments in proteins. We have searched the Protein Data Bank and identified all structurally ambivalent peptides (SAPs) of length five or greater that exist in two different backbone conformations. The SAPs were classified in five distinct categories based on their structure. We propose a novel index that provides a quantitative measure of conformational variability of a sequence fragment. It measures the context-dependent width of the distribution of (phi,xi) dihedral angles associated with each amino acid type. This index was used to analyze the local structural propensity of both SAPs and the sequence fragments contiguous to them. We also analyzed type-specific amino acid composition, solvent accessibility, and overall structural properties of SAPs and their sequence context. We show that each type of SAP has an unusual, type-specific amino acid composition and, as a result, simultaneous intrinsic preferences for two distinct types of backbone conformation. All types of SAPs have lower sequence complexity than average. Fragments that adopt helical conformation in one protein and sheet conformation in another have the lowest sequence complexity and are sampled from a relatively limited repertoire of possible residue combinations. A statistically significant difference between two distinct conformations of the same SAP is observed not only in the overall structural properties of proteins harboring the SAP but also in the properties of its flanking regions and in the pattern of solvent accessibility. These results have implications for protein design and structure prediction.

Assuntos

Fragmentos de Peptídeos/química , Proteínas/química , Sequência de Aminoácidos , Dados de Sequência Molecular , Conformação Proteica , Análise de Sequência de Proteína

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA