RESUMO
SignificanceDeep profiling of the plasma proteome at scale has been a challenge for traditional approaches. We achieve superior performance across the dimensions of precision, depth, and throughput using a panel of surface-functionalized superparamagnetic nanoparticles in comparison to conventional workflows for deep proteomics interrogation. Our automated workflow leverages competitive nanoparticle-protein binding equilibria that quantitatively compress the large dynamic range of proteomes to an accessible scale. Using machine learning, we dissect the contribution of individual physicochemical properties of nanoparticles to the composition of protein coronas. Our results suggest that nanoparticle functionalization can be tailored to protein sets. This work demonstrates the feasibility of deep, precise, unbiased plasma proteomics at a scale compatible with large-scale genomics enabling multiomic studies.
Assuntos
Proteínas Sanguíneas , Aprendizado Profundo , Nanopartículas , Proteômica , Proteínas Sanguíneas/química , Nanopartículas/química , Coroa de Proteína/química , Proteoma , Proteômica/métodosRESUMO
The dynamic range challenge for the detection of proteins and their proteoforms in human plasma has been well documented. Here, we use the nanoparticle protein corona approach to enrich low-abundance proteins selectively and reproducibly from human plasma and use top-down proteomics to quantify differential enrichment for the 2841 detected proteoforms from 114 proteins. Furthermore, nanoparticle enrichment allowed top-down detection of proteoforms between â¼1 µg/mL and â¼10 pg/mL in absolute abundance, providing up to a 105-fold increase in proteome depth over neat plasma in which only proteoforms from abundant proteins (>1 µg/mL) were detected. The ability to monitor medium and some low-abundant proteoforms through reproducible enrichment significantly extends the applicability of proteoform research by adding depth beyond albumin, immunoglobins, and apolipoproteins to uncover many involved in immunity and cell signaling. As proteoforms carry unique information content relative to peptides, this report opens the door to deeper proteoform sequencing in clinical proteomics of disease or aging cohorts.
Assuntos
Proteínas Sanguíneas , Nanopartículas , Proteômica , Humanos , Proteômica/métodos , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/química , Nanopartículas/química , Proteoma/análise , Coroa de Proteína/químicaRESUMO
There is a significant unmet need for clinical reflex tests that increase the specificity of prostate-specific antigen blood testing, the longstanding but imperfect tool for prostate cancer diagnosis. Towards this endpoint, we present the results from a discovery study that identifies new prostate-specific antigen reflex markers in a large-scale patient serum cohort using differentiating technologies for deep proteomic interrogation. We detect known prostate cancer blood markers as well as novel candidates. Through bioinformatic pathway enrichment and network analysis, we reveal associations of differentially abundant proteins with cytoskeletal, metabolic, and ribosomal activities, all of which have been previously associated with prostate cancer progression. Additionally, optimized machine learning classifier analysis reveals proteomic signatures capable of detecting the disease prior to biopsy, performing on par with an accepted clinical risk calculator benchmark.
Assuntos
Biomarcadores Tumorais , Neoplasias da Próstata , Proteômica , Humanos , Masculino , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/sangue , Biomarcadores Tumorais/sangue , Proteômica/métodos , Espectrometria de Mobilidade Iônica/métodos , Antígeno Prostático Específico/sangue , Idoso , Aprendizado de Máquina , Pessoa de Meia-IdadeRESUMO
PURPOSE: The goal of this study was to refine clinical MRS to optimize performance and then determine whether MRS-derived biomarkers reliably identify painful discs, quantify degeneration severity, and forecast surgical outcomes for chronic low back pain (CLBP) patients. METHODS: We performed an observational diagnostic development and accuracy study. Six hundred and twenty-three (623) discs in 139 patients were scanned using MRS, with 275 discs also receiving provocative discography (PD). MRS data were used to quantify spectral features related to disc structure (collagen and proteoglycan) and acidity (lactate, alanine, propionate). Ratios of acidity to structure were used to calculate pain potential. MRS-SCOREs were compared to PD and Pfirrmann grade. Clinical utility was judged by evaluating surgical success for 75 of the subjects who underwent lumbar surgery. RESULTS: Two hundred and six (206) discs had both a successful MRS and independent pain diagnosis. When comparing to PD, MRS had a total accuracy of 85%, sensitivity of 82%, and specificity of 88%. These increased to 93%, 91%, and 93% respectively, in non-herniated discs. The MRS structure measures differed significantly between Pfirrmann grades, except grade I versus grade II. When all MRS positive discs were treated, surgical success was 97% versus 57% when the treated level was MRS negative, or 54% when the non-treated adjacent level was MRS positive. CONCLUSION: MRS correlates with PD and may support improved surgical outcomes for CLBP patients. Noninvasive MRS is a potentially valuable approach to clarifying pain mechanisms and designing CLBP therapies that are customized to the patient. These slides can be retrieved under Electronic Supplementary Material.
Assuntos
Degeneração do Disco Intervertebral/diagnóstico , Disco Intervertebral/metabolismo , Dor Lombar/diagnóstico , Vértebras Lombares/metabolismo , Espectroscopia de Ressonância Magnética/métodos , Adulto , Idoso , Biomarcadores/metabolismo , Feminino , Humanos , Disco Intervertebral/patologia , Disco Intervertebral/cirurgia , Degeneração do Disco Intervertebral/cirurgia , Dor Lombar/cirurgia , Vértebras Lombares/cirurgia , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Mielografia , Proteoglicanas/metabolismo , Sensibilidade e Especificidade , Resultado do Tratamento , Adulto JovemRESUMO
BACKGROUND: The aim was to improve upon an existing blood-based colorectal cancer (CRC) test directed to high-risk symptomatic patients, by developing a new CRC classifier to be used with a new test embodiment. The new test uses a robust assay format-electrochemiluminescence immunoassays-to quantify protein concentrations. The aim was achieved by building and validating a CRC classifier using concentration measures from a large sample set representing a true intent-to-test (ITT) symptomatic population. METHODS: 4435 patient samples were drawn from the Endoscopy II sample set. Samples were collected at seven hospitals across Denmark between 2010 and 2012 from subjects with symptoms of colorectal neoplasia. Colonoscopies revealed the presence or absence of CRC. 27 blood plasma proteins were selected as candidate biomarkers based on previous studies. Multiplexed electrochemiluminescence assays were used to measure the concentrations of these 27 proteins in all 4435 samples. 3066 patients were randomly assigned to the Discovery set, in which machine learning was used to build candidate classifiers. Some classifiers were refined by allowing up to a 25% indeterminate score range. The classifier with the best Discovery set performance was successfully validated in the separate Validation set, consisting of 1336 samples. RESULTS: The final classifier was a logistic regression using ten predictors: eight proteins (A1AG, CEA, CO9, DPPIV, MIF, PKM2, SAA, TFRC), age, and gender. In validation, the indeterminate rate of the new panel was 23.2%, sensitivity/specificity was 0.80/0.83, PPV was 36.5%, and NPV was 97.1%. CONCLUSIONS: The validated classifier serves as the basis of a new blood-based CRC test for symptomatic patients. The improved performance, resulting from robust concentration measures across a large sample set mirroring the ITT population, renders the new test the best available for this population. Results from a test using this classifier can help assess symptomatic patients' CRC risk, increase their colonoscopy compliance, and manage next steps in their care.
RESUMO
The dynamic range challenge for detection of proteins and their proteoforms in human plasma has been well documented. Here, we use the nanoparticle protein corona approach to enrich low-abundant proteins selectively and reproducibly from human plasma and use top-down proteomics to quantify differential enrichment for the 2841 detected proteoforms from 114 proteins. Furthermore, nanoparticle enrichment allowed top-down detection of proteoforms between â¼1 µg/mL and â¼10 pg/mL in absolute abundance, providing up to 10 5 -fold increase in proteome depth over neat plasma in which only proteoforms from abundant proteins (>1 µg/mL) were detected. The ability to monitor medium and some low abundant proteoforms through reproducible enrichment significantly extends the applicability of proteoform research by adding depth beyond albumin, immunoglobins and apolipoproteins to uncover many involved in immunity and cell signaling. As proteoforms carry unique information content relative to peptides, this report opens the door to deeper proteoform sequencing in clinical proteomics of disease or aging cohorts.
RESUMO
As spaceflight becomes more common with commercial crews, blood-based measures of crew health can guide both astronaut biomedicine and countermeasures. By profiling plasma proteins, metabolites, and extracellular vesicles/particles (EVPs) from the SpaceX Inspiration4 crew, we generated "spaceflight secretome profiles," which showed significant differences in coagulation, oxidative stress, and brain-enriched proteins. While >93% of differentially abundant proteins (DAPs) in vesicles and metabolites recovered within six months, the majority (73%) of plasma DAPs were still perturbed post-flight. Moreover, these proteomic alterations correlated better with peripheral blood mononuclear cells than whole blood, suggesting that immune cells contribute more DAPs than erythrocytes. Finally, to discern possible mechanisms leading to brain-enriched protein detection and blood-brain barrier (BBB) disruption, we examined protein changes in dissected brains of spaceflight mice, which showed increases in PECAM-1, a marker of BBB integrity. These data highlight how even short-duration spaceflight can disrupt human and murine physiology and identify spaceflight biomarkers that can guide countermeasure development.
Assuntos
Coagulação Sanguínea , Barreira Hematoencefálica , Encéfalo , Homeostase , Estresse Oxidativo , Voo Espacial , Animais , Humanos , Encéfalo/metabolismo , Barreira Hematoencefálica/metabolismo , Camundongos , Coagulação Sanguínea/fisiologia , Masculino , Secretoma/metabolismo , Camundongos Endogâmicos C57BL , Vesículas Extracelulares/metabolismo , Proteômica/métodos , Biomarcadores/metabolismo , Biomarcadores/sangue , Feminino , Adulto , Proteínas Sanguíneas/metabolismo , Pessoa de Meia-Idade , Leucócitos Mononucleares/metabolismo , Proteoma/metabolismoRESUMO
The harsh radiation environment of space induces the degradation and malfunctioning of electronic systems. Current approaches for protecting these microelectronic devices are generally limited to attenuating a single type of radiation or require only selecting components that have undergone the intensive and expensive process to be radiation-hardened by design. Herein, we describe an alternative fabrication strategy to manufacture multimaterial radiation shielding via direct ink writing of custom tungsten and boron nitride composites. The additively manufactured shields were shown to be capable of attenuating multiple species of radiation by tailoring the composition and architecture of the printed composite materials. The shear-induced alignment during the printing process of the anisotropic boron nitride flakes provided a facile method for introducing favorable thermal management characteristics to the shields. This generalized method offers a promising approach for protecting commercially available microelectronic systems from radiation damage and we anticipate this will vastly enhance the capabilities of future satellites and space systems.
RESUMO
Advancements in deep plasma proteomics are enabling high-resolution measurement of plasma proteoforms, which may reveal a rich source of novel biomarkers previously concealed by aggregated protein methods. Here, we analyze 188 plasma proteomes from non-small cell lung cancer subjects (NSCLC) and controls to identify NSCLC-associated protein isoforms by examining differentially abundant peptides as a proxy for isoform-specific exon usage. We find four proteins comprised of peptides with opposite patterns of abundance between cancer and control subjects. One of these proteins, BMP1, has known isoforms that can explain this differential pattern, for which the abundance of the NSCLC-associated isoform increases with stage of NSCLC progression. The presence of cancer and control-associated isoforms suggests differential regulation of BMP1 isoforms. The identified BMP1 isoforms have known functional differences, which may reveal insights into mechanisms impacting NSCLC disease progression.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Neoplasias Pulmonares/metabolismo , Biomarcadores Tumorais/metabolismo , Isoformas de Proteínas/metabolismo , Peptídeos , Proteína Morfogenética Óssea 1RESUMO
Background: The wide dynamic range of circulating proteins coupled with the diversity of proteoforms present in plasma has historically impeded comprehensive and quantitative characterization of the plasma proteome at scale. Automated nanoparticle (NP) protein corona-based proteomics workflows can efficiently compress the dynamic range of protein abundances into a mass spectrometry (MS)-accessible detection range. This enhances the depth and scalability of quantitative MS-based methods, which can elucidate the molecular mechanisms of biological processes, discover new protein biomarkers, and improve comprehensiveness of MS-based diagnostics. Methods: Investigating multi-species spike-in experiments and a cohort, we investigated fold-change accuracy, linearity, precision, and statistical power for the using the Proteograph™ Product Suite, a deep plasma proteomics workflow, in conjunction with multiple MS instruments. Results: We show that NP-based workflows enable accurate identification (false discovery rate of 1%) of more than 6,000 proteins from plasma (Orbitrap Astral) and, compared to a gold standard neat plasma workflow that is limited to the detection of hundreds of plasma proteins, facilitate quantification of more proteins with accurate fold-changes, high linearity, and precision. Furthermore, we demonstrate high statistical power for the discovery of biomarkers in small- and large-scale cohorts. Conclusions: The automated NP workflow enables high-throughput, deep, and quantitative plasma proteomics investigation with sufficient power to discover new biomarker signatures with a peptide level resolution.
RESUMO
In order to fully understand protein kinase networks, new methods are needed to identify regulators and substrates of kinases, especially for weakly expressed proteins. Here we have developed a hybrid computational search algorithm that combines machine learning and expert knowledge to identify kinase docking sites, and used this algorithm to search the human genome for novel MAP kinase substrates and regulators focused on the JNK family of MAP kinases. Predictions were tested by peptide array followed by rigorous biochemical verification with in vitro binding and kinase assays on wild-type and mutant proteins. Using this procedure, we found new 'D-site' class docking sites in previously known JNK substrates (hnRNP-K, PPM1J/PP2Czeta), as well as new JNK-interacting proteins (MLL4, NEIL1). Finally, we identified new D-site-dependent MAPK substrates, including the hedgehog-regulated transcription factors Gli1 and Gli3, suggesting that a direct connection between MAP kinase and hedgehog signaling may occur at the level of these key regulators. These results demonstrate that a genome-wide search for MAP kinase docking sites can be used to find new docking sites and substrates.
Assuntos
Algoritmos , Inteligência Artificial , Bases de Conhecimento , Proteínas Quinases Ativadas por Mitógeno/química , Sítios de Ligação , Genoma Humano , Humanos , Fatores de Transcrição Kruppel-Like/química , Proteínas do Tecido Nervoso/química , Ligação Proteica , Especificidade por Substrato , Fatores de Transcrição/química , Proteína GLI1 em Dedos de Zinco , Proteína Gli3 com Dedos de ZincoRESUMO
Accurate prediction of the 3-D structure of small molecules is essential in order to understand their physical, chemical, and biological properties, including how they interact with other molecules. Here, we survey the field of high-throughput methods for 3-D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system's performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to state-of-the-art prediction methods, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD (99.6% organic, 94.6% metal-organic), whereas the widely used commercial method CORINA predicts structures for 68.5% (98.5% organic, 51.6% metal-organic). On the common subset of molecules predicted by both methods, COSMOS makes predictions with an average speed per molecule of 0.15 s (0.10 s organic, 0.21 s metal-organic) and an average rmsd of 1.57 Å (1.26 Å organic, 1.90 Å metal-organic), and CORINA makes predictions with an average speed per molecule of 0.13s (0.18s organic, 0.08s metal-organic) and an average rmsd of 1.60 Å (1.13 Å organic, 2.11 Å metal-organic). COSMOS is available through the ChemDB chemoinformatics Web portal at http://cdb.ics.uci.edu/ .
Assuntos
Algoritmos , Química/métodos , Informática/métodos , Modelos Moleculares , Conformação Molecular , Bases de Dados Factuais , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodosRESUMO
Large-scale, unbiased proteomics studies are constrained by the complexity of the plasma proteome. Here we report a highly parallel protein quantitation platform integrating nanoparticle (NP) protein coronas with liquid chromatography-mass spectrometry for efficient proteomic profiling. A protein corona is a protein layer adsorbed onto NPs upon contact with biofluids. Varying the physicochemical properties of engineered NPs translates to distinct protein corona patterns enabling differential and reproducible interrogation of biological samples, including deep sampling of the plasma proteome. Spike experiments confirm a linear signal response. The median coefficient of variation was 22%. We screened 43 NPs and selected a panel of 5, which detect more than 2,000 proteins from 141 plasma samples using a 96-well automated workflow in a pilot non-small cell lung cancer classification study. Our streamlined workflow combines depth of coverage and throughput with precise quantification based on unique interactions between proteins and NPs engineered for deep and scalable quantitative proteomic studies.
Assuntos
Proteínas Sanguíneas/análise , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Neoplasias Pulmonares/diagnóstico , Coroa de Proteína/análise , Proteômica/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Proteínas Sanguíneas/química , Carcinoma Pulmonar de Células não Pequenas/sangue , Cromatografia Líquida de Alta Pressão/métodos , Diagnóstico Diferencial , Feminino , Voluntários Saudáveis , Humanos , Neoplasias Pulmonares/sangue , Masculino , Pessoa de Meia-Idade , Nanopartículas/química , Projetos Piloto , Coroa de Proteína/química , Reprodutibilidade dos Testes , Espectrometria de Massas em Tandem/métodos , Fatores de TempoRESUMO
MOTIVATION: Small organic molecules, from nucleotides and amino acids to metabolites and drugs, play a fundamental role in chemistry, biology and medicine. As databases of small molecules continue to grow and become more open, it is important to develop the tools to search them efficiently. In order to develop a BLAST-like tool for small molecules, one must first understand the statistical behavior of molecular similarity scores. RESULTS: We develop a new detailed theory of molecular similarity scores that can be applied to a variety of molecular representations and similarity measures. For concreteness, we focus on the most widely used measure--the Tanimoto measure applied to chemical fingerprints. In both the case of empirical fingerprints and fingerprints generated by several stochastic models, we derive accurate approximations for both the distribution and extreme value distribution of similarity scores. These approximation are derived using a ratio of correlated Gaussians approach. The theory enables the calculation of significance scores, such as Z-scores and P-values, and the estimation of the top hits list size. Empirical results obtained using both the random models and real data from the ChemDB database are given to corroborate the theory and show how it can be applied to mine chemical space. AVAILABILITY: Data and related resources are available through http://cdb.ics.uci.edu.
Assuntos
Algoritmos , Técnicas de Química Analítica/métodos , Interpretação Estatística de Dados , Bases de Dados Factuais , Compostos Orgânicos/química , Reconhecimento Automatizado de Padrão/métodosRESUMO
Over the past 20â¯years, mass spectrometry (MS) has emerged as a dynamic tool for proteomics biomarker discovery. However, published MS biomarker candidates often do not translate to the clinic, failing during attempts at independent replication. The cause can be shortcomings in study design, sample quality, assay quantitation, and/or quality/process control. To address these shortcomings, we developed an MS workflow in accordance with Tier 2 measurement requirements for targeted peptides, defined by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) "fit-for-purpose" approach, using dynamic multiple reaction monitoring (dMRM), which measures specific peptide transitions during predefined retention time (RT) windows. We describe the development of a robust multipex dMRM assay measuring 641 proteotypic peptides from 392 colorectal cancer (CRC) related proteins, and the procedures to track and handle sample processing and instrument variation over a four-month study, during which the assay measured blood samples from 1045 patients with CRC symptoms. After data collection, transitions were filtered by signal quality metrics before entering receiver operating characteristic (ROC) analysis. The results demonstrated CRC signal carried by 127 proteins in the symptomatic population. The workflow might be further developed to build Tier 1 assays for clinical tests identifying symptomatic individuals at elevated risk of CRC. SIGNIFICANCE: We developed a dMRM MS method with the rigor of a Tier 2 assay as defined by the CPTAC 'fit for purpose approach' [1]. Using quality and process control procedures, the assay was used to quantify 641 proteotypic peptides representing 392 CRC-related proteins in plasma from 1045 CRC-symptomatic patients. To our knowledge, this is the largest MRM method applied to the largest study to date. The results showed that 127 of the proteins carried univariate CRC signal in the symptomatic population. This large number of single biomarkers bodes well for future development of multivariate classifiers to distinguish CRC in the symptomatic population.
Assuntos
Biomarcadores Tumorais/análise , Neoplasias Colorretais/metabolismo , Espectrometria de Massas/métodos , Proteômica/métodos , Adenoma/metabolismo , Adenoma/patologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/metabolismo , Calibragem , Carcinoma/metabolismo , Carcinoma/patologia , Estudos de Casos e Controles , Estudos de Coortes , Neoplasias Colorretais/patologia , Feminino , Ensaios de Triagem em Larga Escala/métodos , Ensaios de Triagem em Larga Escala/normas , Humanos , Estudos Longitudinais , Masculino , Espectrometria de Massas/normas , Pessoa de Meia-Idade , Proteômica/normas , Controle de Qualidade , Adulto JovemRESUMO
In the absence of external stress, the surface tension of a lipid membrane vanishes at equilibrium, and the membrane exhibits long wavelength undulations that can be described as elastic (as opposed to tension-dominated) deformations. These long wavelength fluctuations are generally suppressed in molecular dynamics simulations of membranes, which have typically been carried out on membrane patches with areas <100 nm2 that are replicated by periodic boundary conditions. As a result, finite system-size effects in molecular dynamics simulations of lipid bilayers have been subject to much discussion in the membrane simulation community for several years, and it has been argued that it is necessary to simulate small membrane patches under tension to properly model the tension-free state of macroscopic membranes. Recent hardware and software advances have made it possible to simulate larger, all-atom systems allowing us to directly address the question of whether the relatively small size of current membrane simulations affects their physical characteristics compared to real macroscopic bilayer systems. In this work, system-size effects on the structure of a DOPC bilayer at 5.4 H2O/lipid are investigated by performing molecular dynamics simulations at constant temperature and isotropic pressure (i.e., vanishing surface tension) of small and large single bilayer patches (72 and 288 lipids, respectively), as well as an explicitly multilamellar system consisting of a stack of five 72-lipid bilayers, all replicated in three dimensions by using periodic boundary conditions. The simulation results are compared to X-ray and neutron diffraction data by using a model-free, reciprocal space approach developed recently in our laboratories. Our analysis demonstrates that finite-size effects are negligible in simulations of DOPC bilayers at low hydration, and suggests that refinements are needed in the simulation force fields.
Assuntos
Simulação por Computador , Bicamadas Lipídicas/química , Fluidez de Membrana , Fosfatidilcolinas/química , Cristalografia por Raios X , Modelos Biológicos , Conformação Molecular , Tensão Superficial , Água/químicaRESUMO
BACKGROUND: Well-collected and well-documented sample repositories are necessary for disease biomarker development. The availability of significant numbers of samples with the associated patient information enables biomarker validation to proceed with maximum efficacy and minimum bias. The creation and utilization of such a resource is an important step in the development of blood-based biomarker tests for colorectal cancer. METHODS: We have created a subject data and biological sample resource, Endoscopy II, which is based on 4698 individuals referred for diagnostic colonoscopy in Denmark between May 2010 and November 2012. Of the patients referred based on 1 or more clinical symptoms of colorectal neoplasia, 512 were confirmed by pathology to have colorectal cancer and 399 were confirmed to have advanced adenoma. Using subsets of these sample groups in case-control study designs (300 patients for colorectal cancer, 302 patients for advanced adenoma), 2 panels of plasma-based proteins for colorectal cancer and 1 panel for advanced adenoma were identified and validated based on ELISA data obtained for 28 proteins from the samples. RESULTS: One of the validated colorectal cancer panels was comprised of 8 proteins (CATD, CEA, CO3, CO9, SEPR, AACT, MIF, and PSGL) and had a validation ROC curve area under the curve (AUC) of 0.82 (CI 0.75-0.88). There was no significant difference in the performance between early- and late-stage cancer. The advanced adenoma panel was comprised of 4 proteins (CATD, CLUS, GDF15, SAA1) and had a validation ROC curve AUC of 0.65 (CI 0.56-0.74). CONCLUSIONS: These results suggest that the development of blood-based aids to colorectal cancer detection and diagnosis is feasible.
RESUMO
INTRODUCTION: Colorectal cancer (CRC) testing programs reduce mortality; however, approximately 40% of the recommended population who should undergo CRC testing does not. Early colon cancer detection in patient populations ineligible for testing, such as the elderly or those with significant comorbidities, could have clinical benefit. Despite many attempts to identify individual protein markers of this disease, little progress has been made. Targeted mass spectrometry, using multiple reaction monitoring (MRM) technology, enables the simultaneous assessment of groups of candidates for improved detection performance. MATERIALS AND METHODS: A multiplex assay was developed for 187 candidate marker proteins, using 337 peptides monitored through 674 simultaneously measured MRM transitions in a 30-minute liquid chromatography-mass spectrometry analysis of immunodepleted blood plasma. To evaluate the combined candidate marker performance, the present study used 274 individual patient blood plasma samples, 137 with biopsy-confirmed colorectal cancer and 137 age- and gender-matched controls. Using 2 well-matched platforms running 5 days each week, all 274 samples were analyzed in 52 days. RESULTS: Using one half of the data as a discovery set (69 disease cases and 69 control cases), the elastic net feature selection and random forest classifier assembly were used in cross-validation to identify a 15-transition classifier. The mean training receiver operating characteristic area under the curve was 0.82. After final classifier assembly using the entire discovery set, the 136-sample (68 disease cases and 68 control cases) validation set was evaluated. The validation area under the curve was 0.91. At the point of maximum accuracy (84%), the sensitivity was 87% and the specificity was 81%. CONCLUSION: These results have demonstrated the ability of simultaneous assessment of candidate marker proteins using high-multiplex, targeted-mass spectrometry to identify a subset group of CRC markers with significant and meaningful performance.
Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Colorretais/diagnóstico , Detecção Precoce de Câncer/métodos , Espectrometria de Massas/métodos , Adulto , Idoso , Área Sob a Curva , Neoplasias Colorretais/sangue , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Curva ROC , Sensibilidade e EspecificidadeRESUMO
Power-law distributions have been observed in a wide variety of areas. To our knowledge however, there has been no systematic observation of power-law distributions in chemoinformatics. Here, we present several examples of power-law distributions arising from the features of small, organic molecules. The distributions of rigid segments and ring systems, the distributions of molecular paths and circular substructures, and the sizes of molecular similarity clusters all show linear trends on log-log rank/ frequency plots, suggesting underlying power-law distributions. The number of unique features also follow Heaps'-like laws. The characteristic exponents of the power-laws lie in the 1.5-3 range, consistently with the exponents observed in other power-law phenomena. The power-law nature of these distributions leads to several applications including the prediction of the growth of available data through Heaps' law and the optimal allocation of experimental or computational resources via the 80/20 rule. More importantly, we also show how the power-laws can be leveraged to efficiently compress chemical fingerprints in a lossless manner, useful for the improved storage and retrieval of molecules in large chemical databases.
Assuntos
Modelos Estatísticos , Compostos Orgânicos/química , Bibliotecas de Moléculas Pequenas/química , Análise por Conglomerados , Cadeias de MarkovRESUMO
Many modern chemoinformatics systems for small molecules rely on large fingerprint vector representations, where the components of the vector record the presence or number of occurrences in the molecular graphs of particular combinatorial features, such as labeled paths or labeled trees. These large fingerprint vectors are often compressed to much shorter fingerprint vectors using a lossy compression scheme based on a simple modulo procedure. Here, we combine statistical models of fingerprints with integer entropy codes, such as Golomb and Elias codes, to encode the indices or the run lengths of the fingerprints. After reordering the fingerprint components by decreasing frequency order, the indices are monotone-increasing and the run lengths are quasi-monotone-increasing, and both exhibit power-law distribution trends. We take advantage of these statistical properties to derive new efficient, lossless, compression algorithms for monotone integer sequences: monotone value (MOV) coding and monotone length (MOL) coding. In contrast to lossy systems that use 1024 or more bits of storage per molecule, we can achieve lossless compression of long chemical fingerprints based on circular substructures in slightly over 300 bits per molecule, close to the Shannon entropy limit, using a MOL Elias Gamma code for run lengths. The improvement in storage comes at a modest computational cost. Furthermore, because the compression is lossless, uncompressed similarity (e.g., Tanimoto) between molecules can be computed exactly from their compressed representations, leading to significant improvements in retrival performance, as shown on six benchmark data sets of druglike molecules.