ABSTRACT
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
Subject(s)
Heart Defects, Congenital , Transcriptome , Humans , Animals , Mice , Exome/genetics , Heart Defects, Congenital/genetics , Exome Sequencing , Machine Learning , Single-Cell Analysis/methods , Ubiquitin-Activating Enzymes/geneticsABSTRACT
Numerous genetic studies have established a role for rare genomic variants in Congenital Heart Disease (CHD) at the copy number variation (CNV) and de novo variant (DNV) level. To identify novel haploinsufficient CHD disease genes, we performed an integrative analysis of CNVs and DNVs identified in probands with CHD including cases with sporadic thoracic aortic aneurysm. We assembled CNV data from 7,958 cases and 14,082 controls and performed a gene-wise analysis of the burden of rare genomic deletions in cases versus controls. In addition, we performed variation rate testing for DNVs identified in 2,489 parent-offspring trios. Our analysis revealed 21 genes which were significantly affected by rare CNVs and/or DNVs in probands. Fourteen of these genes have previously been associated with CHD while the remaining genes (FEZ1, MYO16, ARID1B, NALCN, WAC, KDM5B and WHSC1) have only been associated in small cases series or show new associations with CHD. In addition, a systems level analysis revealed affected protein-protein interaction networks involved in Notch signaling pathway, heart morphogenesis, DNA repair and cilia/centrosome function. Taken together, this approach highlights the importance of re-analyzing existing datasets to strengthen disease association and identify novel disease genes and pathways.
Subject(s)
DNA Copy Number Variations/genetics , Haploinsufficiency/genetics , Heart Defects, Congenital/genetics , Databases, Genetic , Gene Expression/genetics , Gene Expression Profiling/methods , Genetic Predisposition to Disease/genetics , Genomics/methods , Humans , Ion Channels/genetics , Membrane Proteins/genetics , Polymorphism, Single Nucleotide/genetics , Transcriptome/geneticsABSTRACT
[This corrects the article DOI: 10.1371/journal.pgen.1009679.].
ABSTRACT
SUMMARY: We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified. AVAILABILITY AND IMPLEMENTATION: The software is freely available. pypgatk: https://github.com/bigbio/py-pgatk/ and pgdb: https://nf-co.re/pgdb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Proteogenomics , Humans , Peptides/genetics , Software , Algorithms , ProteinsABSTRACT
The Omics Discovery Index is an open source platform that can be used to access, discover and disseminate omics datasets. OmicsDI integrates proteomics, genomics, metabolomics, models and transcriptomics datasets. Using an efficient indexing system, OmicsDI integrates different biological entities including genes, transcripts, proteins, metabolites and the corresponding publications from PubMed. In addition, it implements a group of pipelines to estimate the impact of each dataset by tracing the number of citations, reanalysis and biological entities reported by each dataset. Here, we present the OmicsDI REST interface (www.omicsdi.org/ws/) to enable programmatic access to any dataset in OmicsDI or all the datasets for a specific provider (database). Clients can perform queries on the API using different metadata information such as sample details (species, tissues, etc), instrumentation (mass spectrometer, sequencer), keywords and other provided annotations. In addition, we present two different libraries in R and Python to facilitate the development of tools that can programmatically interact with the OmicsDI REST interface.
Subject(s)
Gene Expression Profiling/methods , Proteomics/methods , Software , Databases, Genetic , Datasets as Topic , Genomics/methods , Metabolomics/methods , User-Computer InterfaceABSTRACT
PURPOSE: Rare genetic variants in KDR, encoding the vascular endothelial growth factor receptor 2 (VEGFR2), have been reported in patients with tetralogy of Fallot (TOF). However, their role in disease causality and pathogenesis remains unclear. METHODS: We conducted exome sequencing in a familial case of TOF and large-scale genetic studies, including burden testing, in >1,500 patients with TOF. We studied gene-targeted mice and conducted cell-based assays to explore the role of KDR genetic variation in the etiology of TOF. RESULTS: Exome sequencing in a family with two siblings affected by TOF revealed biallelic missense variants in KDR. Studies in knock-in mice and in HEK 293T cells identified embryonic lethality for one variant when occurring in the homozygous state, and a significantly reduced VEGFR2 phosphorylation for both variants. Rare variant burden analysis conducted in a set of 1,569 patients of European descent with TOF identified a 46-fold enrichment of protein-truncating variants (PTVs) in TOF cases compared to controls (P = 7 × 10-11). CONCLUSION: Rare KDR variants, in particular PTVs, strongly associate with TOF, likely in the setting of different inheritance patterns. Supported by genetic and in vivo and in vitro functional analysis, we propose loss-of-function of VEGFR2 as one of the mechanisms involved in the pathogenesis of TOF.
Subject(s)
Tetralogy of Fallot , Vascular Endothelial Growth Factor Receptor-2 , Animals , Genetic Predisposition to Disease , HEK293 Cells , Humans , Mice , Tetralogy of Fallot/genetics , Vascular Endothelial Growth Factor Receptor-2/genetics , Exome SequencingABSTRACT
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.
Subject(s)
Databases, Protein , Mass Spectrometry , Proteomics , Peptides/chemistry , SoftwareABSTRACT
MOTIVATION: In any macromolecular polyprotic system-for example protein, DNA or RNA-the isoelectric point-commonly referred to as the pI-can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge-and thus the electrophoretic mobility-of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. RESULTS: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. CONTACT: yperez@ebi.ac.uk AVAILABILITY AND IMPLEMENTATION: The software and data are freely available at https://github.com/ypriverol/pIRSupplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Amino Acid Sequence , Isoelectric Focusing , Isoelectric Point , Peptides , Proteomics , Tandem Mass SpectrometryABSTRACT
BACKGROUND: Congenital heart disease (CHD) is the most common congenital anomaly. Almost 90% of isolated cases have an unexplained genetic etiology after clinical testing. Non-canonical splice variants that disrupt mRNA splicing through the loss or creation of exon boundaries are not routinely captured and/or evaluated by standard clinical genetic tests. Recent computational algorithms such as SpliceAI have shown an ability to predict such variants, but are not specific to cardiac-expressed genes and transcriptional isoforms. METHODS: We used genome sequencing (GS) (n = 1101 CHD probands) and myocardial RNA-Sequencing (RNA-Seq) (n = 154 CHD and n = 43 cardiomyopathy probands) to identify and validate splice disrupting variants, and to develop a heart-specific model for canonical and non-canonical splice variants that can be applied to patients with CHD and cardiomyopathy. Two thousand five hundred seventy GS samples from the Medical Genome Reference Bank were analyzed as healthy controls. RESULTS: Of 8583 rare DNA splice-disrupting variants initially identified using SpliceAI, 100 were associated with altered splice junctions in the corresponding patient myocardium affecting 95 genes. Using strength of myocardial gene expression and genome-wide DNA variant features that were confirmed to affect splicing in myocardial RNA, we trained a machine learning model for predicting cardiac-specific splice-disrupting variants (AUC 0.86 on internal validation). In a validation set of 48 CHD probands, the cardiac-specific model outperformed a SpliceAI model alone (AUC 0.94 vs 0.67 respectively). Application of this model to an additional 947 CHD probands with only GS data identified 1% patients with canonical and 11% patients with non-canonical splice-disrupting variants in CHD genes. Forty-nine percent of predicted splice-disrupting variants were intronic and > 10 bp from existing splice junctions. The burden of high-confidence splice-disrupting variants in CHD genes was 1.28-fold higher in CHD cases compared with healthy controls. CONCLUSIONS: A new cardiac-specific in silico model was developed using complementary GS and RNA-Seq data that improved genetic yield by identifying a significant burden of non-canonical splice variants associated with CHD that would not be detectable through panel or exome sequencing.
Subject(s)
RNA Splicing , Humans , Child , Male , Female , Heart Defects, Congenital/genetics , Myocardium/metabolism , Myocardium/pathology , Alternative SplicingABSTRACT
Clinical presentation of congenital heart disease is heterogeneous, making identification of the disease-causing genes and their genetic pathways and mechanisms of action challenging. By using in vivo electrocardiography, transthoracic echocardiography and microcomputed tomography imaging to screen 3,894 single-gene-null mouse lines for structural and functional cardiac abnormalities, here we identify 705 lines with cardiac arrhythmia, myocardial hypertrophy and/or ventricular dilation. Among these 705 genes, 486 have not been previously associated with cardiac dysfunction in humans, and some of them represent variants of unknown relevance (VUR). Mice with mutations in Casz1, Dnajc18, Pde4dip, Rnf38 or Tmem161b genes show developmental cardiac structural abnormalities, with their human orthologs being categorized as VUR. Using UK Biobank data, we validate the importance of the DNAJC18 gene for cardiac homeostasis by showing that its loss of function is associated with altered left ventricular systolic function. Our results identify hundreds of previously unappreciated genes with potential function in congenital heart disease and suggest causal function of five VUR in congenital heart disease.
ABSTRACT
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.
Subject(s)
Data Analysis , Databases, Protein , Metadata , Proteomics , Big Data , Humans , Reproducibility of Results , Software , TranscriptomeABSTRACT
BACKGROUND: Congenital heart disease (CHD) occurs in almost 1% of newborn children and is considered a multifactorial disorder. CHD may segregate in families due to significant contribution of genetic factors in the disease etiology. The aim of the study was to identify pathophysiological mechanisms in families segregating CHD. METHODS: We used whole exome sequencing to identify rare genetic variants in ninety consenting participants from 32 Danish families with recurrent CHD. We applied a systems biology approach to identify developmental mechanisms influenced by accumulation of rare variants. We used an independent cohort of 714 CHD cases and 4922 controls for replication and performed functional investigations using zebrafish as in vivo model. RESULTS: We identified 1785 genes, in which rare alleles were shared between affected individuals within a family. These genes were enriched for known cardiac developmental genes, and 218 of these genes were mutated in more than one family. Our analysis revealed a functional cluster, enriched for proteins with a known participation in calcium signaling. Replication in an independent cohort confirmed increased mutation burden of calcium-signaling genes in CHD patients. Functional investigation of zebrafish orthologues of ITPR1, PLCB2, and ADCY2 verified a role in cardiac development and suggests a combinatorial effect of inactivation of these genes. CONCLUSIONS: The study identifies abnormal calcium signaling as a novel pathophysiological mechanism in human CHD and confirms the complex genetic architecture underlying CHD.
Subject(s)
Calcium Signaling , Calcium/metabolism , Genetic Predisposition to Disease , Heart Defects, Congenital/genetics , Heart Defects, Congenital/metabolism , Systems Biology/methods , Alleles , Animals , Computational Biology/methods , Databases, Genetic , Denmark , Female , Genetic Association Studies/methods , Genetic Variation , Humans , Male , Protein Interaction Mapping , Protein Interaction Maps , Registries , Exome Sequencing , ZebrafishABSTRACT
Valvular heart disease is observed in approximately 2% of the general population1. Although the initial observation is often localized (for example, to the aortic or mitral valve), disease manifestations are regularly observed in the other valves and patients frequently require surgery. Despite the high frequency of heart valve disease, only a handful of genes have so far been identified as the monogenic causes of disease2-7. Here we identify two consanguineous families, each with two affected family members presenting with progressive heart valve disease early in life. Whole-exome sequencing revealed homozygous, truncating nonsense alleles in ADAMTS19 in all four affected individuals. Homozygous knockout mice for Adamts19 show aortic valve dysfunction, recapitulating aspects of the human phenotype. Expression analysis using a lacZ reporter and single-cell RNA sequencing highlight Adamts19 as a novel marker for valvular interstitial cells; inference of gene regulatory networks in valvular interstitial cells positions Adamts19 in a highly discriminatory network driven by the transcription factor lymphoid enhancer-binding factor 1 downstream of the Wnt signaling pathway. Upregulation of endocardial Krüppel-like factor 2 in Adamts19 knockout mice precedes hemodynamic perturbation, showing that a tight balance in the Wnt-Adamts19-Klf2 axis is required for proper valve maturation and maintenance.
Subject(s)
ADAMTS Proteins/metabolism , Gene Expression Regulation, Developmental , Heart Valve Diseases/etiology , ADAMTS Proteins/genetics , Animals , Family , Female , Heart Valve Diseases/pathology , Humans , Kruppel-Like Transcription Factors/genetics , Kruppel-Like Transcription Factors/metabolism , Male , Mice , Mice, Knockout , Pedigree , Single-Cell Analysis , Wnt Signaling PathwayABSTRACT
BACKGROUND: Cardiac disease modelling using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CM) requires thorough insight into cardiac cell type differentiation processes. However, current methods to discriminate different cardiac cell types are mostly time-consuming, are costly and often provide imprecise phenotypic evaluation. DNA methylation plays a critical role during early heart development and cardiac cellular specification. We therefore investigated the DNA methylation pattern in different cardiac tissues to identify CpG loci for further cardiac cell type characterization. RESULTS: An array-based genome-wide DNA methylation analysis using Illumina Infinium HumanMethylation450 BeadChips led to the identification of 168 differentially methylated CpG loci in atrial and ventricular human heart tissue samples (n = 49) from different patients with congenital heart defects (CHD). Systematic evaluation of atrial-ventricular DNA methylation pattern in cardiac tissues in an independent sample cohort of non-failing donor hearts and cardiac patients using bisulfite pyrosequencing helped us to define a subset of 16 differentially methylated CpG loci enabling precise characterization of human atrial and ventricular cardiac tissue samples. This defined set of reproducible cardiac tissue-specific DNA methylation sites allowed us to consistently detect the cellular identity of hiPSC-CM subtypes. CONCLUSION: Testing DNA methylation of only a small set of defined CpG sites thus makes it possible to distinguish atrial and ventricular cardiac tissues and cardiac atrial and ventricular subtypes of hiPSC-CMs. This method represents a rapid and reliable system for phenotypic characterization of in vitro-generated cardiomyocytes and opens new opportunities for cardiovascular research and patient-specific therapy.
Subject(s)
DNA Methylation , Heart Atria/cytology , Heart Defects, Congenital/pathology , Heart Ventricles/cytology , Myocytes, Cardiac/cytology , Cells, Cultured , CpG Islands , Female , Heart Atria/chemistry , Heart Defects, Congenital/genetics , Heart Ventricles/chemistry , Humans , Induced Pluripotent Stem Cells/chemistry , Induced Pluripotent Stem Cells/cytology , Male , Models, Biological , Myocytes, Cardiac/chemistry , Organ Specificity , Sequence Analysis, DNA , Tissue EngineeringABSTRACT
We are moving into the age of 'Big Data' in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S * F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical omics classification includes redundant and irrelevant features (e.g. genes or proteins) that can result in long computation times; decrease of the model performance and the selection of suboptimal features (genes and proteins) after the classification/regression step. Multiple algorithms and reviews has been published to describe all the existing methods for feature selection, their strengths and weakness. However, the selection of the correct FS algorithm and strategy constitutes an enormous challenge. Despite the number and diversity of algorithms available, the proper choice of an approach for facing a specific problem often falls in a 'grey zone'. In this study, we select a subset of FS methods to develop an efficient workflow and an R package for bioinformatics machine learning problems. We cover relevant issues concerning FS, ranging from domain's problems to algorithm solutions and computational tools. Finally, we use seven different proteomics and gene expression datasets to evaluate the workflow and guide the FS process.
Subject(s)
Algorithms , Databases as Topic , Genomics/methods , Workflow , Humans , Multivariate Analysis , Principal Component Analysis , Support Vector MachineABSTRACT
In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. SIGNIFICANCE: Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations.
Subject(s)
Algorithms , Computational Biology/methods , Databases, Protein , Proteomics/methods , Search Engine/methods , Humans , Peptides/chemistry , Protein Isoforms , Software , Tandem Mass SpectrometryABSTRACT
A fully validated bio-analytical method based on Matrix-Assisted-Laser-Desorption/Ionization-Time of Flight Mass Spectrometry was developed for quantitation in human plasma of the anti-tumor peptide CIGB-300. An analog of this peptide acetylated at the N-terminal, was used as internal standard for absolute quantitation. Acid treatment allowed efficient precipitation of plasma proteins as well as high recovery (approximately 80%) of the intact peptide. No other chromatographic step was required for sample processing before MALDI-MS analysis. Spectra were acquired in linear positive ion mode to ensure maximum sensitivity. The lower limit of quantitation was established at 0.5 µg/mL, which is equivalent to 160 fmol peptide. The calibration curve was linear from 0.5 to 7.5 µg/mL, with R(2)>0.98, and permitted quantitation of highly concentrated samples evaluated by dilution integrity testing. All parameters assessed for five validation batches met the FDA guidelines for industry. The method was successfully applied to analysis of clinical samples obtained in a phase I clinical trial following intravenous administration of CIGB-300 at a dose of 1.6 mg/kg body weight. With the exception of Cmax and AUC, pharmacokinetic parameters were similar for ELISA and MALDI-MS methods.
Subject(s)
Antineoplastic Agents/blood , Peptides, Cyclic/blood , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Acetylation , Antineoplastic Agents/chemistry , Clinical Trials as Topic , Humans , Injections, Intravenous , Limit of Detection , Neoplasms/blood , Neoplasms/drug therapy , Peptides, Cyclic/chemistry , Reference Standards , Reproducibility of Results , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/instrumentationABSTRACT
The field of proteomics has grown vertiginously in the last years. This has been due fundamentally to technological improvements in the instrumentation, methods, and easy-to-use software, thereby making it possible to address a large number of biological questions and to deepen the study of the proteome of several organisms. The development in the field has imposed a challenge in the computational analysis of the commonly obtained large datasets generated in a single proteomics experiment, which still remains. An alternative to tackle this general issue has been the use of auxiliary information generated during the proteomics experiment to validate the confidence of the identifications. In this manuscript we review the main molecular descriptors used for building predictor models for estimating retention time, isoelectric point and peptide "detectability", which are key tools in the design of several validation strategies based in these criteria. We also give an overview of the main open source tools and libraries used for computing molecular descriptors.