RESUMO
Despite widespread use in targeted tumor testing, multiplex PCR/semiconductor (Ion Torrent) sequencing-based assessment of all comprehensive genomic profiling (CGP) variant classes has been limited. Herein, we describe the development and validation of StrataNGS, a 429-gene, multiplex PCR/semiconductor sequencing-based CGP laboratory-developed test performed on co-isolated DNA and RNA from formalin-fixed, paraffin-embedded tumor specimens with ≥2 mm2 tumor surface area. Validation was performed in accordance with MolDX CGP validation guidelines using 1986 clinical formalin-fixed, paraffin-embedded samples and an in-house developed optimized bioinformatics pipeline. Across CGP variant classes, accuracy ranged from 0.945 for tumor mutational burden (TMB) status to >0.999 for mutations and gene fusions, positive predictive value ranged from 0.915 for TMB status to 1.00 for gene fusions, and reproducibility ranged from 0.998 for copy number alterations to 1.00 for splice variants and insertions/deletions. StrataNGS TMB estimates were highly correlated to those from whole exome- or FoundationOne CDx-determined TMB (Pearson r = 0.998 and 0.960, respectively); TMB reproducibility was 0.996 (concordance correlation coefficient). Limit of detection for all variant classes was <20% tumor content. Together, we demonstrate that multiplex PCR/semiconductor sequencing-based tumor tissue CGP is feasible using optimized bioinformatic approaches described herein.
Assuntos
Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reação em Cadeia da Polimerase Multiplex/métodos , Neoplasias/genética , Biomarcadores Tumorais/genética , Variações do Número de Cópias de DNA , Confiabilidade dos Dados , Exoma , Estudos de Viabilidade , Fusão Gênica , Humanos , Limite de Detecção , Instabilidade de Microssatélites , Neoplasias/patologia , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodosRESUMO
PURPOSE: Tissue-based comprehensive genomic profiling (CGP) is increasingly used for treatment selection in patients with advanced cancer; however, tissue availability may limit widespread implementation. Here, we established real-world CGP tissue availability and assessed CGP performance on consecutively received samples. MATERIALS AND METHODS: We conducted a post hoc, nonprespecified analysis of 32,048 consecutive tumor tissue samples received for StrataNGS, a multiplex polymerase chain reaction (PCR)-based comprehensive genomic profiling (PCR-CGP) test, as part of an ongoing observational trial (NCT03061305). Sample characteristics and PCR-CGP performance were assessed across all tested samples, including exception samples not meeting minimum input quality control (QC) requirements (< 20% tumor content [TC], < 2 mm2 tumor surface area [TSA], DNA or RNA yield < 1 ng/µL, or specimen age > 5 years). Tests reporting ≥ 1 prioritized alteration or meeting TC and sequencing QC were considered successful. For prostate carcinoma and lung adenocarcinoma, tests reporting ≥ 1 actionable or informative alteration or meeting TC and sequencing QC were considered actionable. RESULTS: Among 31,165 (97.2%) samples where PCR-CGP was attempted, 10.7% had < 20% TC and 59.2% were small (< 25 mm2 tumor surface area). Of 31,101 samples evaluable for input requirements, 8,089 (26.0%) were exceptions not meeting requirements. However, 94.2% of the 31,101 tested samples were successfully reported, including 80.5% of exception samples. Positive predictive value of PCR-CGP for ERBB2 amplification in exceptions and/or sequencing QC-failure breast cancer samples was 96.7%. Importantly, 84.0% of tested prostate carcinomas and 87.9% of lung adenocarcinomas yielded results informing treatment selection. CONCLUSION: Most real-world tissue samples from patients with advanced cancer desiring CGP are limited, requiring optimized CGP approaches to produce meaningful results. An optimized PCR-CGP test, coupled with an inclusive exception testing policy, delivered reportable results for > 94% of samples, potentially expanding the proportion of CGP-testable patients and impact of biomarker-guided therapies.
Assuntos
Genoma Humano , Neoplasias/genética , Biomarcadores Tumorais/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Reação em Cadeia da Polimerase Multiplex/métodos , Neoplasias/patologia , Estudos ProspectivosRESUMO
A FASTA file archive and reference resource has been added to ProteomeCommons.org. Motivation for this new functionality derives from two primary sources. The first is the recent FASTA standardization work done by the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI). Second is the general lack of a uniform mechanism to properly cite FASTA files used in a study, and to publicly access such FASTA files post-publication. An extension to the Tranche data sharing network has been developed that includes web-pages, documentation, and tools for facilitating the use of FASTA files. These include conversion to the new HUPO-PSI format, and provisions for both citing and publicly archiving FASTA files. This new resource is available immediately, free of charge, and can be accessed at http://www.proteomecommons.org/data/fasta/. Source-code for related tools is also freely available under the BSD license.
Assuntos
Biologia Computacional/métodos , Proteômica/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Bases de Dados de Proteínas , Humanos , Armazenamento e Recuperação da Informação , Internet , Mapeamento de Peptídeos , Peptídeos/química , Proteoma , Interface Usuário-ComputadorRESUMO
A current focus of proteomics research is the establishment of acceptable confidence measures in the assignment of protein identifications in an unknown sample. Development of new algorithmic approaches would greatly benefit from a standard reference set of spectra for known proteins for the purpose of testing and training. Here we describe an openly available library of mass spectra generated on an ABI 4700 MALDI TOF/TOF from 246 known, individually purified and trypsin-digested protein samples. The initial full release of the Aurum Dataset includes gel images, peak lists, spectra, search result files, decoy database analysis files, FASTA file of protein sequences, manual curation, and summary pages describing protein coverage and peptides matched by MS/MS followed by decoy database analysis using Mascot, Sequest, and X!Tandem. The data are publicly available for use at ProteomeCommons.org.
Assuntos
Mapeamento de Peptídeos/métodos , Proteômica , Proteínas Recombinantes/química , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Espectrometria de Massas em Tandem/métodos , Sequência de Aminoácidos , Bases de Dados Factuais , Humanos , Dados de Sequência Molecular , Biblioteca de Peptídeos , Padrões de ReferênciaRESUMO
We have recently announced the largest database of protein-ligand complexes, Binding MOAD (Mother of All Databases). After the August 2004 update, Binding MOAD contains 6816 complexes. There are 2220 protein families and 3316 unique ligands. After searching 6000+ crystallography papers, we have obtained binding data for 1793 (27%) of the complexes. We have also created a non-redundant set of complexes with only one complex from each protein family; in that set, 630 (28%) of the unique complexes have binding data. Here, we present information about the data provided at the Binding MOAD website. We also present the results of mining Binding MOAD to map the degree of solvent exposure for binding sites. We have determined that most cavities and ligands (70-85%) are well buried in the complexes. This fits with the common paradigm that a large degree of contact between the ligand and protein is significant in molecular recognition. GoCAV and the GoCAV viewer are the tools we created for this study. To share our data and make our online dataset more useful to other research groups, we have integrated the viewer into the Binding MOAD website (www.BindingMOAD.org).
Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Interface Usuário-Computador , Sítios de Ligação , Cristalografia por Raios X , Humanos , Internet , Ligantes , Modelos Químicos , Modelos Moleculares , Ligação Proteica , Solventes/químicaRESUMO
Multiple reaction monitoring (MRM) is a highly sensitive method of targeted mass spectrometry (MS) that can be used to selectively detect and quantify peptides based on the screening of specified precursor peptide-to-fragment ion transitions. MRM-MS sensitivity depends critically on the tuning of instrument parameters, such as collision energy and cone voltage, for the generation of maximal product ion signal. Although generalized equations and values exist for such instrument parameters, there is no clear indication that optimal signal can be reliably produced for all types of MRM transitions using such an algorithmic approach. To address this issue, we have devised a workflow functional on both Waters Quattro Premier and ABI 4000 QTRAP triple quadrupole instruments that allows rapid determination of the optimal value of any programmable instrument parameter for each MRM transition. Here, we demonstrate the strategy for the optimizations of collision energy and cone voltage, but the method could be applied to other instrument parameters, such as declustering potential, as well. The workflow makes use of the incremental adjustment of the precursor and product m/z values at the hundredth decimal place to create a series of MRM targets at different collision energies that can be cycled through in rapid succession within a single run, avoiding any run-to-run variability in execution or comparison. Results are easily visualized and quantified using the MRM software package Mr. M to determine the optimal instrument parameters for each transition.
Assuntos
Espectrometria de Massas/métodos , Proteômica/métodos , Algoritmos , Sequência de Aminoácidos , Área Sob a Curva , Biomarcadores/química , Biologia Computacional/métodos , Proteínas Fúngicas/química , Haemophilus influenzae/metabolismo , Íons , Dados de Sequência Molecular , Peptídeos/química , Proteoma , SoftwareRESUMO
Unidentified tandem mass spectra typically represent 50-90% of the spectra acquired in proteomics studies. This manuscript describes a novel algorithm, "Bonanza", for clustering spectra without knowledge of peptide or protein identifications. Further analysis leverages existing peptide identifications to infer related, likely valid identifications. Significantly more spectra can be identified with this approach, including spectra with unexpected potential modifications or amino-acid substitutions.
Assuntos
Algoritmos , Análise por Conglomerados , Processamento de Proteína Pós-Traducional , Proteômica/métodos , Espectrometria de Massas em Tandem , Sequência de Aminoácidos , Substituição de Aminoácidos , Dados de Sequência Molecular , Peptídeos/química , Peptídeos/genética , Reprodutibilidade dos TestesRESUMO
MOTIVATION: Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. METHODS: This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. AVAILABILITY: Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.