Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Am J Hum Genet ; 111(2): 338-349, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38228144

RESUMO

Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.


Assuntos
Cardiopatias Congênitas , Transcriptoma , Humanos , Animais , Camundongos , Exoma/genética , Cardiopatias Congênitas/genética , Sequenciamento do Exoma , Aprendizado de Máquina , Análise de Célula Única/métodos , Enzimas Ativadoras de Ubiquitina/genética
3.
PLoS Genet ; 17(7): e1009679, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34324492

RESUMO

Numerous genetic studies have established a role for rare genomic variants in Congenital Heart Disease (CHD) at the copy number variation (CNV) and de novo variant (DNV) level. To identify novel haploinsufficient CHD disease genes, we performed an integrative analysis of CNVs and DNVs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm. We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed variation rate testing for DNVs identified in 2,489 parent-offspring trios. Our analysis revealed 21 genes which were significantly affected by rare CNVs and/or DNVs in probands. Fourteen of these genes have previously been associated with CHD while the remaining genes (FEZ1, MYO16, ARID1B, NALCN, WAC, KDM5B and WHSC1) have only been associated in small cases series or show new associations with CHD. In addition, a systems level analysis revealed affected protein-protein interaction networks involved in Notch signaling pathway, heart morphogenesis, DNA repair and cilia/centrosome function. Taken together, this approach highlights the importance of re-analyzing existing datasets to strengthen disease association and identify novel disease genes and pathways.


Assuntos
Variações do Número de Cópias de DNA/genética , Haploinsuficiência/genética , Cardiopatias Congênitas/genética , Bases de Dados Genéticas , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Predisposição Genética para Doença/genética , Genômica/métodos , Humanos , Canais Iônicos/genética , Proteínas de Membrana/genética , Polimorfismo de Nucleotídeo Único/genética , Transcriptoma/genética
4.
Bioinformatics ; 38(5): 1470-1472, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34904638

RESUMO

SUMMARY: We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified. AVAILABILITY AND IMPLEMENTATION: The software is freely available. pypgatk: https://github.com/bigbio/py-pgatk/ and pgdb: https://nf-co.re/pgdb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteogenômica , Humanos , Peptídeos/genética , Software , Algoritmos , Proteínas
6.
Nucleic Acids Res ; 48(W1): W380-W384, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32374843

RESUMO

The Omics Discovery Index is an open source platform that can be used to access, discover and disseminate omics datasets. OmicsDI integrates proteomics, genomics, metabolomics, models and transcriptomics datasets. Using an efficient indexing system, OmicsDI integrates different biological entities including genes, transcripts, proteins, metabolites and the corresponding publications from PubMed. In addition, it implements a group of pipelines to estimate the impact of each dataset by tracing the number of citations, reanalysis and biological entities reported by each dataset. Here, we present the OmicsDI REST interface (www.omicsdi.org/ws/) to enable programmatic access to any dataset in OmicsDI or all the datasets for a specific provider (database). Clients can perform queries on the API using different metadata information such as sample details (species, tissues, etc), instrumentation (mass spectrometer, sequencer), keywords and other provided annotations. In addition, we present two different libraries in R and Python to facilitate the development of tools that can programmatically interact with the OmicsDI REST interface.


Assuntos
Perfilação da Expressão Gênica/métodos , Proteômica/métodos , Software , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genômica/métodos , Metabolômica/métodos , Interface Usuário-Computador
7.
Genet Med ; 23(10): 1952-1960, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34113005

RESUMO

PURPOSE: Rare genetic variants in KDR, encoding the vascular endothelial growth factor receptor 2 (VEGFR2), have been reported in patients with tetralogy of Fallot (TOF). However, their role in disease causality and pathogenesis remains unclear. METHODS: We conducted exome sequencing in a familial case of TOF and large-scale genetic studies, including burden testing, in >1,500 patients with TOF. We studied gene-targeted mice and conducted cell-based assays to explore the role of KDR genetic variation in the etiology of TOF. RESULTS: Exome sequencing in a family with two siblings affected by TOF revealed biallelic missense variants in KDR. Studies in knock-in mice and in HEK 293T cells identified embryonic lethality for one variant when occurring in the homozygous state, and a significantly reduced VEGFR2 phosphorylation for both variants. Rare variant burden analysis conducted in a set of 1,569 patients of European descent with TOF identified a 46-fold enrichment of protein-truncating variants (PTVs) in TOF cases compared to controls (P = 7 × 10-11). CONCLUSION: Rare KDR variants, in particular PTVs, strongly associate with TOF, likely in the setting of different inheritance patterns. Supported by genetic and in vivo and in vitro functional analysis, we propose loss-of-function of VEGFR2 as one of the mechanisms involved in the pathogenesis of TOF.


Assuntos
Tetralogia de Fallot , Receptor 2 de Fatores de Crescimento do Endotélio Vascular , Animais , Predisposição Genética para Doença , Células HEK293 , Humanos , Camundongos , Tetralogia de Fallot/genética , Receptor 2 de Fatores de Crescimento do Endotélio Vascular/genética , Sequenciamento do Exoma
8.
Nucleic Acids Res ; 47(D1): D442-D450, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30395289

RESUMO

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica , Peptídeos/química , Software
9.
Bioinformatics ; 32(6): 821-7, 2016 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-26568629

RESUMO

MOTIVATION: In any macromolecular polyprotic system-for example protein, DNA or RNA-the isoelectric point-commonly referred to as the pI-can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge-and thus the electrophoretic mobility-of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. RESULTS: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. CONTACT: yperez@ebi.ac.uk AVAILABILITY AND IMPLEMENTATION: The software and data are freely available at https://github.com/ypriverol/pIRSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequência de Aminoácidos , Focalização Isoelétrica , Ponto Isoelétrico , Peptídeos , Proteômica , Espectrometria de Massas em Tandem
11.
Nat Commun ; 12(1): 5854, 2021 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-34615866

RESUMO

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Assuntos
Análise de Dados , Bases de Dados de Proteínas , Metadados , Proteômica , Big Data , Humanos , Reprodutibilidade dos Testes , Software , Transcriptoma
12.
Genome Med ; 12(1): 76, 2020 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-32859249

RESUMO

BACKGROUND: Congenital heart disease (CHD) occurs in almost 1% of newborn children and is considered a multifactorial disorder. CHD may segregate in families due to significant contribution of genetic factors in the disease etiology. The aim of the study was to identify pathophysiological mechanisms in families segregating CHD. METHODS: We used whole exome sequencing to identify rare genetic variants in ninety consenting participants from 32 Danish families with recurrent CHD. We applied a systems biology approach to identify developmental mechanisms influenced by accumulation of rare variants. We used an independent cohort of 714 CHD cases and 4922 controls for replication and performed functional investigations using zebrafish as in vivo model. RESULTS: We identified 1785 genes, in which rare alleles were shared between affected individuals within a family. These genes were enriched for known cardiac developmental genes, and 218 of these genes were mutated in more than one family. Our analysis revealed a functional cluster, enriched for proteins with a known participation in calcium signaling. Replication in an independent cohort confirmed increased mutation burden of calcium-signaling genes in CHD patients. Functional investigation of zebrafish orthologues of ITPR1, PLCB2, and ADCY2 verified a role in cardiac development and suggests a combinatorial effect of inactivation of these genes. CONCLUSIONS: The study identifies abnormal calcium signaling as a novel pathophysiological mechanism in human CHD and confirms the complex genetic architecture underlying CHD.


Assuntos
Sinalização do Cálcio , Cálcio/metabolismo , Predisposição Genética para Doença , Cardiopatias Congênitas/genética , Cardiopatias Congênitas/metabolismo , Biologia de Sistemas/métodos , Alelos , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Dinamarca , Feminino , Estudos de Associação Genética/métodos , Variação Genética , Humanos , Masculino , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Sistema de Registros , Sequenciamento do Exoma , Peixe-Zebra
13.
Nat Genet ; 52(1): 40-47, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31844321

RESUMO

Valvular heart disease is observed in approximately 2% of the general population1. Although the initial observation is often localized (for example, to the aortic or mitral valve), disease manifestations are regularly observed in the other valves and patients frequently require surgery. Despite the high frequency of heart valve disease, only a handful of genes have so far been identified as the monogenic causes of disease2-7. Here we identify two consanguineous families, each with two affected family members presenting with progressive heart valve disease early in life. Whole-exome sequencing revealed homozygous, truncating nonsense alleles in ADAMTS19 in all four affected individuals. Homozygous knockout mice for Adamts19 show aortic valve dysfunction, recapitulating aspects of the human phenotype. Expression analysis using a lacZ reporter and single-cell RNA sequencing highlight Adamts19 as a novel marker for valvular interstitial cells; inference of gene regulatory networks in valvular interstitial cells positions Adamts19 in a highly discriminatory network driven by the transcription factor lymphoid enhancer-binding factor 1 downstream of the Wnt signaling pathway. Upregulation of endocardial Krüppel-like factor 2 in Adamts19 knockout mice precedes hemodynamic perturbation, showing that a tight balance in the Wnt-Adamts19-Klf2 axis is required for proper valve maturation and maintenance.


Assuntos
Proteínas ADAMTS/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Doenças das Valvas Cardíacas/etiologia , Proteínas ADAMTS/genética , Animais , Família , Feminino , Doenças das Valvas Cardíacas/patologia , Humanos , Fatores de Transcrição Kruppel-Like/genética , Fatores de Transcrição Kruppel-Like/metabolismo , Masculino , Camundongos , Camundongos Knockout , Linhagem , Análise de Célula Única , Via de Sinalização Wnt
14.
Clin Epigenetics ; 11(1): 89, 2019 06 11.
Artigo em Inglês | MEDLINE | ID: mdl-31186048

RESUMO

BACKGROUND: Cardiac disease modelling using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CM) requires thorough insight into cardiac cell type differentiation processes. However, current methods to discriminate different cardiac cell types are mostly time-consuming, are costly and often provide imprecise phenotypic evaluation. DNA methylation plays a critical role during early heart development and cardiac cellular specification. We therefore investigated the DNA methylation pattern in different cardiac tissues to identify CpG loci for further cardiac cell type characterization. RESULTS: An array-based genome-wide DNA methylation analysis using Illumina Infinium HumanMethylation450 BeadChips led to the identification of 168 differentially methylated CpG loci in atrial and ventricular human heart tissue samples (n = 49) from different patients with congenital heart defects (CHD). Systematic evaluation of atrial-ventricular DNA methylation pattern in cardiac tissues in an independent sample cohort of non-failing donor hearts and cardiac patients using bisulfite pyrosequencing helped us to define a subset of 16 differentially methylated CpG loci enabling precise characterization of human atrial and ventricular cardiac tissue samples. This defined set of reproducible cardiac tissue-specific DNA methylation sites allowed us to consistently detect the cellular identity of hiPSC-CM subtypes. CONCLUSION: Testing DNA methylation of only a small set of defined CpG sites thus makes it possible to distinguish atrial and ventricular cardiac tissues and cardiac atrial and ventricular subtypes of hiPSC-CMs. This method represents a rapid and reliable system for phenotypic characterization of in vitro-generated cardiomyocytes and opens new opportunities for cardiovascular research and patient-specific therapy.


Assuntos
Metilação de DNA , Átrios do Coração/citologia , Cardiopatias Congênitas/patologia , Ventrículos do Coração/citologia , Miócitos Cardíacos/citologia , Células Cultivadas , Ilhas de CpG , Feminino , Átrios do Coração/química , Cardiopatias Congênitas/genética , Ventrículos do Coração/química , Humanos , Células-Tronco Pluripotentes Induzidas/química , Células-Tronco Pluripotentes Induzidas/citologia , Masculino , Modelos Biológicos , Miócitos Cardíacos/química , Especificidade de Órgãos , Análise de Sequência de DNA , Engenharia Tecidual
15.
PLoS One ; 12(12): e0189875, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29261781

RESUMO

We are moving into the age of 'Big Data' in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S * F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical omics classification includes redundant and irrelevant features (e.g. genes or proteins) that can result in long computation times; decrease of the model performance and the selection of suboptimal features (genes and proteins) after the classification/regression step. Multiple algorithms and reviews has been published to describe all the existing methods for feature selection, their strengths and weakness. However, the selection of the correct FS algorithm and strategy constitutes an enormous challenge. Despite the number and diversity of algorithms available, the proper choice of an approach for facing a specific problem often falls in a 'grey zone'. In this study, we select a subset of FS methods to develop an efficient workflow and an R package for bioinformatics machine learning problems. We cover relevant issues concerning FS, ranging from domain's problems to algorithm solutions and computational tools. Finally, we use seven different proteomics and gene expression datasets to evaluate the workflow and guide the FS process.


Assuntos
Algoritmos , Bases de Dados como Assunto , Genômica/métodos , Fluxo de Trabalho , Humanos , Análise Multivariada , Análise de Componente Principal , Máquina de Vetores de Suporte
16.
J Proteomics ; 150: 170-182, 2017 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-27498275

RESUMO

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. SIGNIFICANCE: Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations.


Assuntos
Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteômica/métodos , Ferramenta de Busca/métodos , Humanos , Peptídeos/química , Isoformas de Proteínas , Software , Espectrometria de Massas em Tandem
17.
J Pharm Biomed Anal ; 105: 107-114, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25546027

RESUMO

A fully validated bio-analytical method based on Matrix-Assisted-Laser-Desorption/Ionization-Time of Flight Mass Spectrometry was developed for quantitation in human plasma of the anti-tumor peptide CIGB-300. An analog of this peptide acetylated at the N-terminal, was used as internal standard for absolute quantitation. Acid treatment allowed efficient precipitation of plasma proteins as well as high recovery (approximately 80%) of the intact peptide. No other chromatographic step was required for sample processing before MALDI-MS analysis. Spectra were acquired in linear positive ion mode to ensure maximum sensitivity. The lower limit of quantitation was established at 0.5 µg/mL, which is equivalent to 160 fmol peptide. The calibration curve was linear from 0.5 to 7.5 µg/mL, with R(2)>0.98, and permitted quantitation of highly concentrated samples evaluated by dilution integrity testing. All parameters assessed for five validation batches met the FDA guidelines for industry. The method was successfully applied to analysis of clinical samples obtained in a phase I clinical trial following intravenous administration of CIGB-300 at a dose of 1.6 mg/kg body weight. With the exception of Cmax and AUC, pharmacokinetic parameters were similar for ELISA and MALDI-MS methods.


Assuntos
Antineoplásicos/sangue , Peptídeos Cíclicos/sangue , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Acetilação , Antineoplásicos/química , Ensaios Clínicos como Assunto , Humanos , Injeções Intravenosas , Limite de Detecção , Neoplasias/sangue , Neoplasias/tratamento farmacológico , Peptídeos Cíclicos/química , Padrões de Referência , Reprodutibilidade dos Testes , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/instrumentação
18.
Curr Top Med Chem ; 14(3): 388-97, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24304317

RESUMO

The field of proteomics has grown vertiginously in the last years. This has been due fundamentally to technological improvements in the instrumentation, methods, and easy-to-use software, thereby making it possible to address a large number of biological questions and to deepen the study of the proteome of several organisms. The development in the field has imposed a challenge in the computational analysis of the commonly obtained large datasets generated in a single proteomics experiment, which still remains. An alternative to tackle this general issue has been the use of auxiliary information generated during the proteomics experiment to validate the confidence of the identifications. In this manuscript we review the main molecular descriptors used for building predictor models for estimating retention time, isoelectric point and peptide "detectability", which are key tools in the design of several validation strategies based in these criteria. We also give an overview of the main open source tools and libraries used for computing molecular descriptors.


Assuntos
Espectrometria de Massas , Proteômica , Software
19.
J Proteomics ; 75(7): 2269-74, 2012 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-22326964

RESUMO

IPG (Immobilized pH Gradient) based separations are frequently used as the first step in shotgun proteomics methods; it yields an increase in both the dynamic range and resolution of peptide separation prior to the LC-MS analysis. Experimental isoelectric point (pI) values can improve peptide identifications in conjunction with MS/MS information. Thus, accurate estimation of the pI value based on the amino acid sequence becomes critical to perform these kinds of experiments. Nowadays, pI is commonly predicted using the charge-state model [1], and/or the cofactor algorithm [2]. However, none of these methods is capable of calculating the pI value for basic peptides accurately. In this manuscript, we present an new approach that can significant improve the pI estimation, by using Support Vector Machines (SVM) [3], an experimental amino acid descriptor taken from the AAIndex database [4] and the isoelectric point predicted by the charge-state model. Our results have shown a strong correlation (R(2)=0.98) between the predicted and observed values, with a standard deviation of 0.32 pH units across the complete pH range.


Assuntos
Modelos Químicos , Peptídeos/química , Máquina de Vetores de Suporte , Ponto Isoelétrico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA