Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Res Sq ; 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-39011112

RESUMO

Critical evaluation of computational tools for predicting variant effects is important considering their increased use in disease diagnosis and driving molecular discoveries. In the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, a dataset of 28 STK11 rare variants (27 missense, 1 single amino acid deletion), identified in primary non-small cell lung cancer biopsies, was experimentally assayed to characterize computational methods from four participating teams and five publicly available tools. Predictors demonstrated a high level of performance on key evaluation metrics, measuring correlation with the assay outputs and separating loss-of-function (LoF) variants from wildtype-like (WT-like) variants. The best participant model, 3Cnet, performed competitively with well-known tools. Unique to this challenge was that the functional data was generated with both biological and technical replicates, thus allowing the assessors to realistically establish maximum predictive performance based on experimental variability. Three out of the five publicly available tools and 3Cnet approached the performance of the assay replicates in separating LoF variants from WT-like variants. Surprisingly, REVEL, an often-used model, achieved a comparable correlation with the real-valued assay output as that seen for the experimental replicates. Performing variant interpretation by combining the new functional evidence with computational and population data evidence led to 16 new variants receiving a clinically actionable classification of likely pathogenic (LP) or likely benign (LB). Overall, the STK11 challenge highlights the utility of variant effect predictors in biomedical sciences and provides encouraging results for driving research in the field of computational genome interpretation.

2.
Bioinformatics ; 40(Suppl 1): i428-i436, 2024 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940171

RESUMO

MOTIVATION: Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs). RESULTS: We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time. AVAILABILITY AND IMPLEMENTATION: https://github.com/shawn-peng/xlms.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos/química , Proteínas/química
3.
Hum Genet ; 141(10): 1595-1613, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34549350

RESUMO

Whole-exome and whole-genome sequencing studies in autism spectrum disorder (ASD) have identified hundreds of thousands of exonic variants. Only a handful of them, primarily loss-of-function variants, have been shown to increase the risk for ASD, while the contributory roles of other variants, including most missense variants, remain unknown. New approaches that combine tissue-specific molecular profiles with patients' genetic data can thus play an important role in elucidating the functional impact of exonic variation and improve understanding of ASD pathogenesis. Here, we integrate spatio-temporal gene co-expression networks from the developing human brain and protein-protein interaction networks to first reach accurate prioritization of ASD risk genes based on their connectivity patterns with previously known high-confidence ASD risk genes. We subsequently integrate these gene scores with variant pathogenicity predictions to further prioritize individual exonic variants based on the positive-unlabeled learning framework with gene- and variant-score calibration. We demonstrate that this approach discriminates among variants between cases and controls at the high end of the prediction range. Finally, we experimentally validate our top-scoring de novo mutation NP_001243143.1:p.Phe309Ser in the sodium/potassium-transporting ATPase ATP1A3 to disrupt protein binding with different partners.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Adenosina Trifosfatases/genética , Adenosina Trifosfatases/metabolismo , Transtorno do Espectro Autista/genética , Transtorno Autístico/genética , Predisposição Genética para Doença , Humanos , Mutação , Potássio/metabolismo , Sódio/metabolismo , ATPase Trocadora de Sódio-Potássio/genética
4.
Bioinformatics ; 36(Suppl_2): i745-i753, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381824

RESUMO

MOTIVATION: Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. RESULTS: We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. AVAILABILITYAND IMPLEMENTATION: https://github.com/shawn-peng/FDR-estimation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Células HeLa , Humanos , Peptídeos
5.
Hum Mutat ; 40(9): 1546-1556, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31294896

RESUMO

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.


Assuntos
Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias da Mama/diagnóstico , Biologia Computacional/métodos , Neoplasias Ovarianas/diagnóstico , Neoplasias da Mama/genética , Detecção Precoce de Câncer , Feminino , Predisposição Genética para Doença , Testes Genéticos , Variação Genética , Humanos , Modelos Genéticos , Neoplasias Ovarianas/genética
6.
Hum Mutat ; 40(9): 1530-1545, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31301157

RESUMO

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.


Assuntos
Substituição de Aminoácidos , Biologia Computacional/métodos , Cistationina beta-Sintase/genética , Cistationina/metabolismo , Cistationina beta-Sintase/metabolismo , Homocisteína/metabolismo , Humanos , Fenótipo , Medicina de Precisão
7.
Hum Mutat ; 40(9): 1612-1622, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31241222

RESUMO

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.


Assuntos
Neoplasias da Mama/genética , Quinase do Ponto de Checagem 2/genética , Biologia Computacional/métodos , Hispânico ou Latino/genética , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Neoplasias da Mama/etnologia , Estudos de Casos e Controles , Simulação por Computador , Feminino , Predisposição Genética para Doença , Humanos , Modelos Lineares , Pessoa de Meia-Idade , Estados Unidos/etnologia , Sequenciamento do Exoma
8.
PLoS Comput Biol ; 15(6): e1007112, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31199787

RESUMO

Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.


Assuntos
Predisposição Genética para Doença/genética , Genoma Humano , Mutação INDEL , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/fisiopatologia , Biologia Computacional , Bases de Dados Genéticas , Genoma Humano/genética , Genoma Humano/fisiologia , Humanos , Mutação INDEL/genética , Mutação INDEL/fisiologia , Aprendizado de Máquina , Curva ROC
9.
Hum Mutat ; 40(9): 1495-1506, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31184403

RESUMO

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.


Assuntos
Biologia Computacional/métodos , Metiltransferases/química , Mutação , PTEN Fosfo-Hidrolase/química , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metiltransferases/genética , PTEN Fosfo-Hidrolase/genética , Estabilidade Proteica
10.
Hum Mutat ; 38(9): 1182-1192, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28634997

RESUMO

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.


Assuntos
Transtorno Bipolar/genética , Doença de Crohn/genética , Sequenciamento do Exoma/métodos , Medicina de Precisão/métodos , Varfarina/uso terapêutico , Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Disseminação de Informação , Variantes Farmacogenômicos , Fenótipo , Varfarina/farmacologia
11.
J Biomol Struct Dyn ; 35(11): 2337-2350, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27498722

RESUMO

Over the past 30 years, several hundred eukaryotic proteins spanning from yeast to man have been shown to be S-palmitoylated. This post-translational modification involves the reversible addition of a 16-carbon saturated fatty acyl chain onto the cysteine residue of a protein where it regulates protein membrane association and distribution, conformation, and stability. However, the large-scale proteome-wide discovery of new palmitoylated proteins has been hindered by the difficulty of identifying a palmitoylation consensus sequence. Using a bioinformatics approach, we show that the enrichment of hydrophobic and basic residues, the cellular context of the protein, and the structural features of the residues surrounding the palmitoylated cysteine all influence the likelihood of palmitoylation. We developed a new palmitoylation predictor that incorporates these identified features, and this predictor achieves a Matthews Correlation Coefficient of .74 using 10-fold cross validation, and significantly outperforms existing predictors on unbiased testing sets. This demonstrates that palmitoylation sites can be predicted with accuracy by taking into account not only physiochemical properties of the modified cysteine and its surrounding residues, but also structural parameters and the subcellular localization of the modified cysteine. This will allow for improved predictions of palmitoylated residues in uncharacterized proteins. A web-based version of this predictor is currently under development.


Assuntos
Cisteína/metabolismo , Lipoilação , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Fenômenos Químicos , Biologia Computacional/métodos , Sequência Consenso , Cisteína/química , Bases de Dados de Proteínas , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Proteoma/química
12.
Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-27666373

RESUMO

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Assuntos
Doença/genética , Mutação de Sentido Incorreto/genética , Software , Área Sob a Curva , Análise Mutacional de DNA , Exoma/genética , Frequência do Gene , Humanos , Curva ROC
13.
J Proteome Res ; 15(6): 1830-41, 2016 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-27068484

RESUMO

Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein-protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the expanded nature of the search space (all pairs of peptides in a sequence database) and the fact that some peptide-spectrum matches (PSMs) contain one correct and one incorrect peptide but often receive scores that are comparable to those in which both peptides are correctly identified. To address these problems and improve detection of cross-linked peptides, we propose a new database search algorithm, XLSearch, for identifying cross-linked peptides. Our approach is based on a data-driven scoring scheme that independently estimates the probability of correctly identifying each individual peptide in the cross-link given knowledge of the correct or incorrect identification of the other peptide. These conditional probabilities are subsequently used to estimate the joint posterior probability that both peptides are correctly identified. Using the data from two previous cross-link studies, we show the effectiveness of this scoring scheme, particularly in distinguishing between true identifications and those containing one incorrect peptide. We also provide evidence that XLSearch achieves more identifications than two alternative methods at the same false discovery rate (availability: https://github.com/COL-IU/XLSearch ).


Assuntos
Algoritmos , Bases de Dados de Proteínas , Peptídeos/análise , Reagentes de Ligações Cruzadas , Peptídeos/química , Probabilidade , Proteômica/métodos , Espectrometria de Massas em Tandem
14.
ACS Chem Biol ; 10(11): 2529-36, 2015 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-26255674

RESUMO

Palmitoylation, a post-translational modification in which a saturated 16-carbon chain is added predominantly to a cysteine residue, participates in various biological functions. The position of proline relative to other residues being post-translationally modified has been previously reported as being important. We determined that proline is statistically enriched around cysteines known to be S-palmitoylated. The goal of this work was to determine how the position of proline influences the palmitoylation of the cysteine residue. We established a mass spectrometry-based approach to investigate time- and temperature-dependent kinetics of autopalmitoylation in vitro and to derive the thermodynamic parameters of the transition state associated with palmitoylation; to the best of our knowledge, our work is the first to study the kinetics and activation properties of the palmitoylation process. We then used these thermochemical parameters to determine if the position of proline relative to the modified cysteine is important for palmitoylation. Our results show that peptides with proline at the -1 position of cysteine in their sequence (PC) have lower enthalpic barriers and higher entropic barriers in comparison to the same peptides with proline at the +1 position of cysteine (CP); interestingly, the free-energy barriers for both pairs are almost identical. Molecular dynamics studies demonstrate that the flexibility of the cysteine backbone in the PC-containing peptide when compared to the CP-containing peptide explains the increased entropic barrier and decreased enthalpic barrier observed experimentally.


Assuntos
Cisteína/química , Modelos Moleculares , Ácido Palmítico/metabolismo , Peptídeos/química , Prolina/química , Lipoilação , Simulação de Dinâmica Molecular , Ácido Palmítico/química , Peptídeos/metabolismo , Processamento de Proteína Pós-Traducional
15.
J Am Soc Mass Spectrom ; 26(3): 444-52, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25503299

RESUMO

The influence of the position of the amino acid proline in polypeptide sequences is examined by a combination of ion mobility spectrometry-mass spectrometry (IMS-MS), amino acid substitutions, and molecular modeling. The results suggest that when proline exists as the second residue from the N-terminus (i.e., penultimate proline), two families of conformers are formed. We demonstrate the existence of these families by a study of a series of truncated and mutated peptides derived from the 11-residue peptide Ser(1)-Pro(2)-Glu(3)-Leu(4)-Pro(5)-Ser(6)-Pro(7)-Gln(8)-Ala(9)-Glu(10)-Lys(11). We find that every peptide from this sequence with a penultimate proline residue has multiple conformations. Substitution of Ala for Pro residues indicates that multiple conformers arise from the cis-trans isomerization of Xaa(1)-Pro(2) peptide bonds as Xaa-Ala peptide bonds are unlikely to adopt the cis isomer, and examination of spectra from a library of 58 peptides indicates that ~80% of sequences show this effect. A simple mechanism suggesting that the barrier between the cis- and trans-proline forms is lowered because of low steric impedance is proposed. This observation may have interesting biological implications as well, and we note that a number of biologically active peptides have penultimate proline residues.


Assuntos
Prolina/química , Proteínas/química , Sequência de Aminoácidos , Isomerismo , Espectrometria de Massas , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica
16.
Genome Biol ; 15(1): R19, 2014 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-24451234

RESUMO

We have developed a novel machine-learning approach, MutPred Splice, for the identification of coding region substitutions that disrupt pre-mRNA splicing. Applying MutPred Splice to human disease-causing exonic mutations suggests that 16% of mutations causing inherited disease and 10 to 14% of somatic mutations in cancer may disrupt pre-mRNA splicing. For inherited disease, the main mechanism responsible for the splicing defect is splice site loss, whereas for cancer the predominant mechanism of splicing disruption is predicted to be exon skipping via loss of exonic splicing enhancers or gain of exonic splicing silencer elements. MutPred Splice is available at http://mutdb.org/mutpredsplice.


Assuntos
Processamento Alternativo/genética , Éxons , Variação Genética , Aprendizado de Máquina , Genes Supressores de Tumor , Humanos , Íntrons , Mutação , Mutação de Sentido Incorreto , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Precursores de RNA/genética , Sítios de Splice de RNA/genética , Elementos Silenciadores Transcricionais/genética
17.
Int J Mass Spectrom ; 368: 6-14, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-26023288

RESUMO

Cross sections for 61 palmitoylated peptides and 73 cysteine-unmodified peptides are determined and used together with a previously obtained tryptic peptide library to derive a set of intrinsic size parameters (ISPs) for the palmitoyl (Pal) group (1.26 ± 0.04), carboxyamidomethyl (Am) group (0.92 ± 0.04), and the 20 amino acid residues to assess the influence of Pal- and Am-modification on cysteine and other amino acid residues. These values highlight the influence of the intrinsic hydrophobic and hydrophilic nature of these modifications on the overall cross sections. As a part of this analysis, we find that ISPs derived from a database of a modifier on one amino acid residue (CysPal) can be applied on the same modification group on different amino acid residues (SerPal and TyrPal). Using these ISP values, we are able to calculate peptide cross sections to within ± 2% of experimental values for 83% of Pal-modified peptide ions and 63% of Am-modified peptide ions. We propose that modification groups should be treated as individual contribution factors, instead of treating the combination of the particular group and the amino acid residue they are on as a whole when considering their effects on the peptide ion mobility features.

18.
Mol Cell Proteomics ; 12(8): 2354-69, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23660473

RESUMO

Global phosphorylation changes in plants in response to environmental stress have been relatively poorly characterized to date. Here we introduce a novel mass spectrometry-based label-free quantitation method that facilitates systematic profiling plant phosphoproteome changes with high efficiency and accuracy. This method employs synthetic peptide libraries tailored specifically as internal standards for complex phosphopeptide samples and accordingly, a local normalization algorithm, LAXIC, which calculates phosphopeptide abundance normalized locally with co-eluting library peptides. Normalization was achieved in a small time frame centered to each phosphopeptide to compensate for the diverse ion suppression effect across retention time. The label-free LAXIC method was further treated with a linear regression function to accurately measure phosphoproteome responses to osmotic stress in Arabidopsis. Among 2027 unique phosphopeptides identified and 1850 quantified phosphopeptides in Arabidopsis samples, 468 regulated phosphopeptides representing 497 phosphosites have shown significant changes. Several known and novel components in the abiotic stress pathway were identified, illustrating the capability of this method to identify critical signaling events among dynamic and complex phosphorylation. Further assessment of those regulated proteins may help shed light on phosphorylation response to osmotic stress in plants.


Assuntos
Arabidopsis/metabolismo , Pressão Osmótica/fisiologia , Fosfopeptídeos/metabolismo , Proteínas de Plantas/metabolismo , Proteômica/métodos , Algoritmos , Linhagem Celular Tumoral , Humanos , Espectrometria de Massas , Biblioteca de Peptídeos , Fosforilação , Proteoma
19.
Proteomics ; 13(5): 756-65, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23303707

RESUMO

Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.


Assuntos
Bases de Dados de Proteínas , Ensaios de Triagem em Larga Escala/métodos , Peptídeos/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Animais , Humanos , Biblioteca de Peptídeos
20.
Hum Mutat ; 34(1): 255-65, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22949387

RESUMO

Classification of rare missense substitutions observed during genetic testing for patient management is a considerable problem in clinical genetics. The Bayesian integrated evaluation of unclassified variants is a solution originally developed for BRCA1/2. Here, we take a step toward an analogous system for the mismatch repair (MMR) genes (MLH1, MSH2, MSH6, and PMS2) that confer colon cancer susceptibility in Lynch syndrome by calibrating in silico tools to estimate prior probabilities of pathogenicity for MMR gene missense substitutions. A qualitative five-class classification system was developed and applied to 143 MMR missense variants. This identified 74 missense substitutions suitable for calibration. These substitutions were scored using six different in silico tools (Align-Grantham Variation Grantham Deviation, multivariate analysis of protein polymorphisms [MAPP], MutPred, PolyPhen-2.1, Sorting Intolerant From Tolerant, and Xvar), using curated MMR multiple sequence alignments where possible. The output from each tool was calibrated by regression against the classifications of the 74 missense substitutions; these calibrated outputs are interpretable as prior probabilities of pathogenicity. MAPP was the most accurate tool and MAPP + PolyPhen-2.1 provided the best-combined model (R(2)  = 0.62 and area under receiver operating characteristic = 0.93). The MAPP + PolyPhen-2.1 output is sufficiently predictive to feed as a continuous variable into the quantitative Bayesian integrated evaluation for clinical classification of MMR gene missense substitutions.


Assuntos
Biologia Computacional/métodos , Reparo de Erro de Pareamento de DNA/genética , Predisposição Genética para Doença/genética , Mutação de Sentido Incorreto , Proteínas Adaptadoras de Transdução de Sinal/genética , Adenosina Trifosfatases/genética , Teorema de Bayes , Calibragem , Neoplasias Colorretais Hereditárias sem Polipose/genética , Biologia Computacional/classificação , Biologia Computacional/normas , Enzimas Reparadoras do DNA/genética , Proteínas de Ligação a DNA/genética , Humanos , Endonuclease PMS2 de Reparo de Erro de Pareamento , Proteína 1 Homóloga a MutL , Proteína 2 Homóloga a MutS/genética , Proteínas Nucleares/genética , Análise de Regressão , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA