Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Nat Commun ; 15(1): 4862, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38862464

RESUMEN

As spaceflight becomes more common with commercial crews, blood-based measures of crew health can guide both astronaut biomedicine and countermeasures. By profiling plasma proteins, metabolites, and extracellular vesicles/particles (EVPs) from the SpaceX Inspiration4 crew, we generated "spaceflight secretome profiles," which showed significant differences in coagulation, oxidative stress, and brain-enriched proteins. While >93% of differentially abundant proteins (DAPs) in vesicles and metabolites recovered within six months, the majority (73%) of plasma DAPs were still perturbed post-flight. Moreover, these proteomic alterations correlated better with peripheral blood mononuclear cells than whole blood, suggesting that immune cells contribute more DAPs than erythrocytes. Finally, to discern possible mechanisms leading to brain-enriched protein detection and blood-brain barrier (BBB) disruption, we examined protein changes in dissected brains of spaceflight mice, which showed increases in PECAM-1, a marker of BBB integrity. These data highlight how even short-duration spaceflight can disrupt human and murine physiology and identify spaceflight biomarkers that can guide countermeasure development.


Asunto(s)
Coagulación Sanguínea , Barrera Hematoencefálica , Encéfalo , Homeostasis , Estrés Oxidativo , Vuelo Espacial , Animales , Humanos , Encéfalo/metabolismo , Barrera Hematoencefálica/metabolismo , Ratones , Coagulación Sanguínea/fisiología , Masculino , Secretoma/metabolismo , Ratones Endogámicos C57BL , Vesículas Extracelulares/metabolismo , Proteómica/métodos , Biomarcadores/metabolismo , Biomarcadores/sangre , Femenino , Adulto , Proteínas Sanguíneas/metabolismo , Persona de Mediana Edad , Leucocitos Mononucleares/metabolismo , Proteoma/metabolismo
2.
bioRxiv ; 2023 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-37693476

RESUMEN

Background: The wide dynamic range of circulating proteins coupled with the diversity of proteoforms present in plasma has historically impeded comprehensive and quantitative characterization of the plasma proteome at scale. Automated nanoparticle (NP) protein corona-based proteomics workflows can efficiently compress the dynamic range of protein abundances into a mass spectrometry (MS)-accessible detection range. This enhances the depth and scalability of quantitative MS-based methods, which can elucidate the molecular mechanisms of biological processes, discover new protein biomarkers, and improve comprehensiveness of MS-based diagnostics. Methods: Investigating multi-species spike-in experiments and a cohort, we investigated fold-change accuracy, linearity, precision, and statistical power for the using the Proteograph™ Product Suite, a deep plasma proteomics workflow, in conjunction with multiple MS instruments. Results: We show that NP-based workflows enable accurate identification (false discovery rate of 1%) of more than 6,000 proteins from plasma (Orbitrap Astral) and, compared to a gold standard neat plasma workflow that is limited to the detection of hundreds of plasma proteins, facilitate quantification of more proteins with accurate fold-changes, high linearity, and precision. Furthermore, we demonstrate high statistical power for the discovery of biomarkers in small- and large-scale cohorts. Conclusions: The automated NP workflow enables high-throughput, deep, and quantitative plasma proteomics investigation with sufficient power to discover new biomarker signatures with a peptide level resolution.

3.
ACS Appl Mater Interfaces ; 15(29): 35400-35410, 2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37289198

RESUMEN

The harsh radiation environment of space induces the degradation and malfunctioning of electronic systems. Current approaches for protecting these microelectronic devices are generally limited to attenuating a single type of radiation or require only selecting components that have undergone the intensive and expensive process to be radiation-hardened by design. Herein, we describe an alternative fabrication strategy to manufacture multimaterial radiation shielding via direct ink writing of custom tungsten and boron nitride composites. The additively manufactured shields were shown to be capable of attenuating multiple species of radiation by tailoring the composition and architecture of the printed composite materials. The shear-induced alignment during the printing process of the anisotropic boron nitride flakes provided a facile method for introducing favorable thermal management characteristics to the shields. This generalized method offers a promising approach for protecting commercially available microelectronic systems from radiation damage and we anticipate this will vastly enhance the capabilities of future satellites and space systems.

4.
PLoS One ; 18(3): e0282821, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36989217

RESUMEN

Advancements in deep plasma proteomics are enabling high-resolution measurement of plasma proteoforms, which may reveal a rich source of novel biomarkers previously concealed by aggregated protein methods. Here, we analyze 188 plasma proteomes from non-small cell lung cancer subjects (NSCLC) and controls to identify NSCLC-associated protein isoforms by examining differentially abundant peptides as a proxy for isoform-specific exon usage. We find four proteins comprised of peptides with opposite patterns of abundance between cancer and control subjects. One of these proteins, BMP1, has known isoforms that can explain this differential pattern, for which the abundance of the NSCLC-associated isoform increases with stage of NSCLC progression. The presence of cancer and control-associated isoforms suggests differential regulation of BMP1 isoforms. The identified BMP1 isoforms have known functional differences, which may reveal insights into mechanisms impacting NSCLC disease progression.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Neoplasias Pulmonares/metabolismo , Biomarcadores de Tumor/metabolismo , Isoformas de Proteínas/metabolismo , Péptidos , Proteína Morfogenética Ósea 1
5.
Proc Natl Acad Sci U S A ; 119(11): e2106053119, 2022 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-35275789

RESUMEN

SignificanceDeep profiling of the plasma proteome at scale has been a challenge for traditional approaches. We achieve superior performance across the dimensions of precision, depth, and throughput using a panel of surface-functionalized superparamagnetic nanoparticles in comparison to conventional workflows for deep proteomics interrogation. Our automated workflow leverages competitive nanoparticle-protein binding equilibria that quantitatively compress the large dynamic range of proteomes to an accessible scale. Using machine learning, we dissect the contribution of individual physicochemical properties of nanoparticles to the composition of protein coronas. Our results suggest that nanoparticle functionalization can be tailored to protein sets. This work demonstrates the feasibility of deep, precise, unbiased plasma proteomics at a scale compatible with large-scale genomics enabling multiomic studies.


Asunto(s)
Proteínas Sanguíneas , Aprendizaje Profundo , Nanopartículas , Proteómica , Proteínas Sanguíneas/química , Nanopartículas/química , Corona de Proteínas/química , Proteoma , Proteómica/métodos
6.
Nat Commun ; 11(1): 3662, 2020 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-32699280

RESUMEN

Large-scale, unbiased proteomics studies are constrained by the complexity of the plasma proteome. Here we report a highly parallel protein quantitation platform integrating nanoparticle (NP) protein coronas with liquid chromatography-mass spectrometry for efficient proteomic profiling. A protein corona is a protein layer adsorbed onto NPs upon contact with biofluids. Varying the physicochemical properties of engineered NPs translates to distinct protein corona patterns enabling differential and reproducible interrogation of biological samples, including deep sampling of the plasma proteome. Spike experiments confirm a linear signal response. The median coefficient of variation was 22%. We screened 43 NPs and selected a panel of 5, which detect more than 2,000 proteins from 141 plasma samples using a 96-well automated workflow in a pilot non-small cell lung cancer classification study. Our streamlined workflow combines depth of coverage and throughput with precise quantification based on unique interactions between proteins and NPs engineered for deep and scalable quantitative proteomic studies.


Asunto(s)
Proteínas Sanguíneas/análisis , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico , Neoplasias Pulmonares/diagnóstico , Corona de Proteínas/análisis , Proteómica/métodos , Adulto , Anciano , Anciano de 80 o más Años , Proteínas Sanguíneas/química , Carcinoma de Pulmón de Células no Pequeñas/sangre , Cromatografía Líquida de Alta Presión/métodos , Diagnóstico Diferencial , Femenino , Voluntarios Sanos , Humanos , Neoplasias Pulmonares/sangre , Masculino , Persona de Mediana Edad , Nanopartículas/química , Proyectos Piloto , Corona de Proteínas/química , Reproducibilidad de los Resultados , Espectrometría de Masas en Tándem/métodos , Factores de Tiempo
7.
Eur Spine J ; 28(4): 674-687, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30610465

RESUMEN

PURPOSE: The goal of this study was to refine clinical MRS to optimize performance and then determine whether MRS-derived biomarkers reliably identify painful discs, quantify degeneration severity, and forecast surgical outcomes for chronic low back pain (CLBP) patients. METHODS: We performed an observational diagnostic development and accuracy study. Six hundred and twenty-three (623) discs in 139 patients were scanned using MRS, with 275 discs also receiving provocative discography (PD). MRS data were used to quantify spectral features related to disc structure (collagen and proteoglycan) and acidity (lactate, alanine, propionate). Ratios of acidity to structure were used to calculate pain potential. MRS-SCOREs were compared to PD and Pfirrmann grade. Clinical utility was judged by evaluating surgical success for 75 of the subjects who underwent lumbar surgery. RESULTS: Two hundred and six (206) discs had both a successful MRS and independent pain diagnosis. When comparing to PD, MRS had a total accuracy of 85%, sensitivity of 82%, and specificity of 88%. These increased to 93%, 91%, and 93% respectively, in non-herniated discs. The MRS structure measures differed significantly between Pfirrmann grades, except grade I versus grade II. When all MRS positive discs were treated, surgical success was 97% versus 57% when the treated level was MRS negative, or 54% when the non-treated adjacent level was MRS positive. CONCLUSION: MRS correlates with PD and may support improved surgical outcomes for CLBP patients. Noninvasive MRS is a potentially valuable approach to clarifying pain mechanisms and designing CLBP therapies that are customized to the patient. These slides can be retrieved under Electronic Supplementary Material.


Asunto(s)
Degeneración del Disco Intervertebral/diagnóstico , Disco Intervertebral/metabolismo , Dolor de la Región Lumbar/diagnóstico , Vértebras Lumbares/metabolismo , Espectroscopía de Resonancia Magnética/métodos , Adulto , Anciano , Biomarcadores/metabolismo , Femenino , Humanos , Disco Intervertebral/patología , Disco Intervertebral/cirugía , Degeneración del Disco Intervertebral/cirugía , Dolor de la Región Lumbar/cirugía , Vértebras Lumbares/cirugía , Imagen por Resonancia Magnética/métodos , Masculino , Persona de Mediana Edad , Mielografía , Proteoglicanos/metabolismo , Sensibilidad y Especificidad , Resultado del Tratamiento , Adulto Joven
8.
J Proteomics ; 187: 80-92, 2018 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-29953963

RESUMEN

Over the past 20 years, mass spectrometry (MS) has emerged as a dynamic tool for proteomics biomarker discovery. However, published MS biomarker candidates often do not translate to the clinic, failing during attempts at independent replication. The cause can be shortcomings in study design, sample quality, assay quantitation, and/or quality/process control. To address these shortcomings, we developed an MS workflow in accordance with Tier 2 measurement requirements for targeted peptides, defined by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) "fit-for-purpose" approach, using dynamic multiple reaction monitoring (dMRM), which measures specific peptide transitions during predefined retention time (RT) windows. We describe the development of a robust multipex dMRM assay measuring 641 proteotypic peptides from 392 colorectal cancer (CRC) related proteins, and the procedures to track and handle sample processing and instrument variation over a four-month study, during which the assay measured blood samples from 1045 patients with CRC symptoms. After data collection, transitions were filtered by signal quality metrics before entering receiver operating characteristic (ROC) analysis. The results demonstrated CRC signal carried by 127 proteins in the symptomatic population. The workflow might be further developed to build Tier 1 assays for clinical tests identifying symptomatic individuals at elevated risk of CRC. SIGNIFICANCE: We developed a dMRM MS method with the rigor of a Tier 2 assay as defined by the CPTAC 'fit for purpose approach' [1]. Using quality and process control procedures, the assay was used to quantify 641 proteotypic peptides representing 392 CRC-related proteins in plasma from 1045 CRC-symptomatic patients. To our knowledge, this is the largest MRM method applied to the largest study to date. The results showed that 127 of the proteins carried univariate CRC signal in the symptomatic population. This large number of single biomarkers bodes well for future development of multivariate classifiers to distinguish CRC in the symptomatic population.


Asunto(s)
Biomarcadores de Tumor/análisis , Neoplasias Colorrectales/metabolismo , Espectrometría de Masas/métodos , Proteómica/métodos , Adenoma/metabolismo , Adenoma/patología , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/metabolismo , Calibración , Carcinoma/metabolismo , Carcinoma/patología , Estudios de Casos y Controles , Estudios de Cohortes , Neoplasias Colorrectales/patología , Femenino , Ensayos Analíticos de Alto Rendimiento/métodos , Ensayos Analíticos de Alto Rendimiento/normas , Humanos , Estudios Longitudinales , Masculino , Espectrometría de Masas/normas , Persona de Mediana Edad , Proteómica/normas , Control de Calidad , Adulto Joven
9.
Clin Proteomics ; 14: 28, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28769740

RESUMEN

BACKGROUND: The aim was to improve upon an existing blood-based colorectal cancer (CRC) test directed to high-risk symptomatic patients, by developing a new CRC classifier to be used with a new test embodiment. The new test uses a robust assay format-electrochemiluminescence immunoassays-to quantify protein concentrations. The aim was achieved by building and validating a CRC classifier using concentration measures from a large sample set representing a true intent-to-test (ITT) symptomatic population. METHODS: 4435 patient samples were drawn from the Endoscopy II sample set. Samples were collected at seven hospitals across Denmark between 2010 and 2012 from subjects with symptoms of colorectal neoplasia. Colonoscopies revealed the presence or absence of CRC. 27 blood plasma proteins were selected as candidate biomarkers based on previous studies. Multiplexed electrochemiluminescence assays were used to measure the concentrations of these 27 proteins in all 4435 samples. 3066 patients were randomly assigned to the Discovery set, in which machine learning was used to build candidate classifiers. Some classifiers were refined by allowing up to a 25% indeterminate score range. The classifier with the best Discovery set performance was successfully validated in the separate Validation set, consisting of 1336 samples. RESULTS: The final classifier was a logistic regression using ten predictors: eight proteins (A1AG, CEA, CO9, DPPIV, MIF, PKM2, SAA, TFRC), age, and gender. In validation, the indeterminate rate of the new panel was 23.2%, sensitivity/specificity was 0.80/0.83, PPV was 36.5%, and NPV was 97.1%. CONCLUSIONS: The validated classifier serves as the basis of a new blood-based CRC test for symptomatic patients. The improved performance, resulting from robust concentration measures across a large sample set mirroring the ITT population, renders the new test the best available for this population. Results from a test using this classifier can help assess symptomatic patients' CRC risk, increase their colonoscopy compliance, and manage next steps in their care.

10.
Clin Colorectal Cancer ; 15(2): 186-194.e13, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27237338

RESUMEN

INTRODUCTION: Colorectal cancer (CRC) testing programs reduce mortality; however, approximately 40% of the recommended population who should undergo CRC testing does not. Early colon cancer detection in patient populations ineligible for testing, such as the elderly or those with significant comorbidities, could have clinical benefit. Despite many attempts to identify individual protein markers of this disease, little progress has been made. Targeted mass spectrometry, using multiple reaction monitoring (MRM) technology, enables the simultaneous assessment of groups of candidates for improved detection performance. MATERIALS AND METHODS: A multiplex assay was developed for 187 candidate marker proteins, using 337 peptides monitored through 674 simultaneously measured MRM transitions in a 30-minute liquid chromatography-mass spectrometry analysis of immunodepleted blood plasma. To evaluate the combined candidate marker performance, the present study used 274 individual patient blood plasma samples, 137 with biopsy-confirmed colorectal cancer and 137 age- and gender-matched controls. Using 2 well-matched platforms running 5 days each week, all 274 samples were analyzed in 52 days. RESULTS: Using one half of the data as a discovery set (69 disease cases and 69 control cases), the elastic net feature selection and random forest classifier assembly were used in cross-validation to identify a 15-transition classifier. The mean training receiver operating characteristic area under the curve was 0.82. After final classifier assembly using the entire discovery set, the 136-sample (68 disease cases and 68 control cases) validation set was evaluated. The validation area under the curve was 0.91. At the point of maximum accuracy (84%), the sensitivity was 87% and the specificity was 81%. CONCLUSION: These results have demonstrated the ability of simultaneous assessment of candidate marker proteins using high-multiplex, targeted-mass spectrometry to identify a subset group of CRC markers with significant and meaningful performance.


Asunto(s)
Biomarcadores de Tumor/sangre , Neoplasias Colorrectales/diagnóstico , Detección Precoz del Cáncer/métodos , Espectrometría de Masas/métodos , Adulto , Anciano , Área Bajo la Curva , Neoplasias Colorrectales/sangre , Femenino , Humanos , Masculino , Persona de Mediana Edad , Curva ROC , Sensibilidad y Especificidad
11.
J Appl Lab Med ; 1(2): 181-193, 2016 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-33626780

RESUMEN

BACKGROUND: Well-collected and well-documented sample repositories are necessary for disease biomarker development. The availability of significant numbers of samples with the associated patient information enables biomarker validation to proceed with maximum efficacy and minimum bias. The creation and utilization of such a resource is an important step in the development of blood-based biomarker tests for colorectal cancer. METHODS: We have created a subject data and biological sample resource, Endoscopy II, which is based on 4698 individuals referred for diagnostic colonoscopy in Denmark between May 2010 and November 2012. Of the patients referred based on 1 or more clinical symptoms of colorectal neoplasia, 512 were confirmed by pathology to have colorectal cancer and 399 were confirmed to have advanced adenoma. Using subsets of these sample groups in case-control study designs (300 patients for colorectal cancer, 302 patients for advanced adenoma), 2 panels of plasma-based proteins for colorectal cancer and 1 panel for advanced adenoma were identified and validated based on ELISA data obtained for 28 proteins from the samples. RESULTS: One of the validated colorectal cancer panels was comprised of 8 proteins (CATD, CEA, CO3, CO9, SEPR, AACT, MIF, and PSGL) and had a validation ROC curve area under the curve (AUC) of 0.82 (CI 0.75-0.88). There was no significant difference in the performance between early- and late-stage cancer. The advanced adenoma panel was comprised of 4 proteins (CATD, CLUS, GDF15, SAA1) and had a validation ROC curve AUC of 0.65 (CI 0.56-0.74). CONCLUSIONS: These results suggest that the development of blood-based aids to colorectal cancer detection and diagnosis is feasible.

12.
J Chem Inf Model ; 51(4): 760-76, 2011 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-21417267

RESUMEN

Accurate prediction of the 3-D structure of small molecules is essential in order to understand their physical, chemical, and biological properties, including how they interact with other molecules. Here, we survey the field of high-throughput methods for 3-D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system's performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to state-of-the-art prediction methods, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD (99.6% organic, 94.6% metal-organic), whereas the widely used commercial method CORINA predicts structures for 68.5% (98.5% organic, 51.6% metal-organic). On the common subset of molecules predicted by both methods, COSMOS makes predictions with an average speed per molecule of 0.15 s (0.10 s organic, 0.21 s metal-organic) and an average rmsd of 1.57 Å (1.26 Å organic, 1.90 Å metal-organic), and CORINA makes predictions with an average speed per molecule of 0.13s (0.18s organic, 0.08s metal-organic) and an average rmsd of 1.60 Å (1.13 Å organic, 2.11 Å metal-organic). COSMOS is available through the ChemDB chemoinformatics Web portal at http://cdb.ics.uci.edu/ .


Asunto(s)
Algoritmos , Química/métodos , Informática/métodos , Modelos Moleculares , Conformación Molecular , Bases de Datos Factuales , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodos
13.
PLoS Comput Biol ; 6(8)2010 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-20865152

RESUMEN

In order to fully understand protein kinase networks, new methods are needed to identify regulators and substrates of kinases, especially for weakly expressed proteins. Here we have developed a hybrid computational search algorithm that combines machine learning and expert knowledge to identify kinase docking sites, and used this algorithm to search the human genome for novel MAP kinase substrates and regulators focused on the JNK family of MAP kinases. Predictions were tested by peptide array followed by rigorous biochemical verification with in vitro binding and kinase assays on wild-type and mutant proteins. Using this procedure, we found new 'D-site' class docking sites in previously known JNK substrates (hnRNP-K, PPM1J/PP2Czeta), as well as new JNK-interacting proteins (MLL4, NEIL1). Finally, we identified new D-site-dependent MAPK substrates, including the hedgehog-regulated transcription factors Gli1 and Gli3, suggesting that a direct connection between MAP kinase and hedgehog signaling may occur at the level of these key regulators. These results demonstrate that a genome-wide search for MAP kinase docking sites can be used to find new docking sites and substrates.


Asunto(s)
Algoritmos , Inteligencia Artificial , Bases del Conocimiento , Proteínas Quinasas Activadas por Mitógenos/química , Sitios de Unión , Genoma Humano , Humanos , Factores de Transcripción de Tipo Kruppel/química , Proteínas del Tejido Nervioso/química , Unión Proteica , Especificidad por Sustrato , Factores de Transcripción/química , Proteína con Dedos de Zinc GLI1 , Proteína Gli3 con Dedos de Zinc
14.
Bioinformatics ; 24(13): i357-65, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18586735

RESUMEN

MOTIVATION: Small organic molecules, from nucleotides and amino acids to metabolites and drugs, play a fundamental role in chemistry, biology and medicine. As databases of small molecules continue to grow and become more open, it is important to develop the tools to search them efficiently. In order to develop a BLAST-like tool for small molecules, one must first understand the statistical behavior of molecular similarity scores. RESULTS: We develop a new detailed theory of molecular similarity scores that can be applied to a variety of molecular representations and similarity measures. For concreteness, we focus on the most widely used measure--the Tanimoto measure applied to chemical fingerprints. In both the case of empirical fingerprints and fingerprints generated by several stochastic models, we derive accurate approximations for both the distribution and extreme value distribution of similarity scores. These approximation are derived using a ratio of correlated Gaussians approach. The theory enables the calculation of significance scores, such as Z-scores and P-values, and the estimation of the top hits list size. Empirical results obtained using both the random models and real data from the ChemDB database are given to corroborate the theory and show how it can be applied to mine chemical space. AVAILABILITY: Data and related resources are available through http://cdb.ics.uci.edu.


Asunto(s)
Algoritmos , Técnicas de Química Analítica/métodos , Interpretación Estadística de Datos , Bases de Datos Factuales , Compuestos Orgánicos/química , Reconocimiento de Normas Patrones Automatizadas/métodos
15.
J Chem Inf Model ; 48(6): 1138-51, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18522387

RESUMEN

Power-law distributions have been observed in a wide variety of areas. To our knowledge however, there has been no systematic observation of power-law distributions in chemoinformatics. Here, we present several examples of power-law distributions arising from the features of small, organic molecules. The distributions of rigid segments and ring systems, the distributions of molecular paths and circular substructures, and the sizes of molecular similarity clusters all show linear trends on log-log rank/ frequency plots, suggesting underlying power-law distributions. The number of unique features also follow Heaps'-like laws. The characteristic exponents of the power-laws lie in the 1.5-3 range, consistently with the exponents observed in other power-law phenomena. The power-law nature of these distributions leads to several applications including the prediction of the growth of available data through Heaps' law and the optimal allocation of experimental or computational resources via the 80/20 rule. More importantly, we also show how the power-laws can be leveraged to efficiently compress chemical fingerprints in a lossless manner, useful for the improved storage and retrieval of molecules in large chemical databases.


Asunto(s)
Modelos Estadísticos , Compuestos Orgánicos/química , Bibliotecas de Moléculas Pequeñas/química , Análisis por Conglomerados , Cadenas de Markov
16.
J Chem Inf Model ; 47(6): 2098-109, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17967006

RESUMEN

Many modern chemoinformatics systems for small molecules rely on large fingerprint vector representations, where the components of the vector record the presence or number of occurrences in the molecular graphs of particular combinatorial features, such as labeled paths or labeled trees. These large fingerprint vectors are often compressed to much shorter fingerprint vectors using a lossy compression scheme based on a simple modulo procedure. Here, we combine statistical models of fingerprints with integer entropy codes, such as Golomb and Elias codes, to encode the indices or the run lengths of the fingerprints. After reordering the fingerprint components by decreasing frequency order, the indices are monotone-increasing and the run lengths are quasi-monotone-increasing, and both exhibit power-law distribution trends. We take advantage of these statistical properties to derive new efficient, lossless, compression algorithms for monotone integer sequences: monotone value (MOV) coding and monotone length (MOL) coding. In contrast to lossy systems that use 1024 or more bits of storage per molecule, we can achieve lossless compression of long chemical fingerprints based on circular substructures in slightly over 300 bits per molecule, close to the Shannon entropy limit, using a MOL Elias Gamma code for run lengths. The improvement in storage comes at a modest computational cost. Furthermore, because the compression is lossless, uncompressed similarity (e.g., Tanimoto) between molecules can be computed exactly from their compressed representations, leading to significant improvements in retrival performance, as shown on six benchmark data sets of druglike molecules.


Asunto(s)
Entropía , Modelos Químicos , Estructura Molecular , Factores de Tiempo
17.
J Phys Chem B ; 110(47): 24157-64, 2006 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-17125387

RESUMEN

In the absence of external stress, the surface tension of a lipid membrane vanishes at equilibrium, and the membrane exhibits long wavelength undulations that can be described as elastic (as opposed to tension-dominated) deformations. These long wavelength fluctuations are generally suppressed in molecular dynamics simulations of membranes, which have typically been carried out on membrane patches with areas <100 nm2 that are replicated by periodic boundary conditions. As a result, finite system-size effects in molecular dynamics simulations of lipid bilayers have been subject to much discussion in the membrane simulation community for several years, and it has been argued that it is necessary to simulate small membrane patches under tension to properly model the tension-free state of macroscopic membranes. Recent hardware and software advances have made it possible to simulate larger, all-atom systems allowing us to directly address the question of whether the relatively small size of current membrane simulations affects their physical characteristics compared to real macroscopic bilayer systems. In this work, system-size effects on the structure of a DOPC bilayer at 5.4 H2O/lipid are investigated by performing molecular dynamics simulations at constant temperature and isotropic pressure (i.e., vanishing surface tension) of small and large single bilayer patches (72 and 288 lipids, respectively), as well as an explicitly multilamellar system consisting of a stack of five 72-lipid bilayers, all replicated in three dimensions by using periodic boundary conditions. The simulation results are compared to X-ray and neutron diffraction data by using a model-free, reciprocal space approach developed recently in our laboratories. Our analysis demonstrates that finite-size effects are negligible in simulations of DOPC bilayers at low hydration, and suggests that refinements are needed in the simulation force fields.


Asunto(s)
Simulación por Computador , Membrana Dobles de Lípidos/química , Fluidez de la Membrana , Fosfatidilcolinas/química , Cristalografía por Rayos X , Modelos Biológicos , Conformación Molecular , Tensión Superficial , Agua/química
18.
Biophys J ; 91(10): 3617-29, 2006 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-16950837

RESUMEN

We have recently shown that current molecular dynamics (MD) atomic force fields are not yet able to produce lipid bilayer structures that agree with experimentally-determined structures within experimental errors. Because of the many advantages offered by experimentally validated simulations, we have developed a novel restraint method for membrane MD simulations that uses experimental diffraction data. The restraints, introduced into the MD force field, act upon specified groups of atoms to restrain their mean positions and widths to values determined experimentally. The method was first tested using a simple liquid argon system, and then applied to a neat dioleoylphosphatidylcholine (DOPC) bilayer at 66% relative humidity and to the same bilayer containing the peptide melittin. Application of experiment-based restraints to the transbilayer double-bond and water distributions of neat DOPC bilayers led to distributions that agreed with the experimental values. Based upon the experimental structure, the restraints improved the simulated structure in some regions while introducing larger differences in others, as might be expected from imperfect force fields. For the DOPC-melittin system, the experimental transbilayer distribution of melittin was used as a restraint. The addition of the peptide caused perturbations of the simulated bilayer structure, but which were larger than observed experimentally. The melittin distribution of the simulation could be fit accurately to a Gaussian with parameters close to the observed ones, indicating that the restraints can be used to produce an ensemble of membrane-bound peptide conformations that are consistent with experiments. Such ensembles pave the way for understanding peptide-bilayer interactions at the atomic level.


Asunto(s)
Membrana Dobles de Lípidos/química , Meliteno/química , Proteínas de la Membrana/química , Modelos Químicos , Modelos Moleculares , Fosfatidilcolinas/química , Birrefringencia , Simulación por Computador , Péptidos/química , Estrés Mecánico
19.
Biophys J ; 88(2): 805-17, 2005 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-15533925

RESUMEN

A novel protocol has been developed for comparing the structural properties of lipid bilayers determined by simulation with those determined by diffraction experiments, which makes it possible to test critically the ability of molecular dynamics simulations to reproduce experimental data. This model-independent method consists of analyzing data from molecular dynamics bilayer simulations in the same way as experimental data by determining the structure factors of the system and, via Fourier reconstruction, the overall transbilayer scattering-density profiles. Multi-nanosecond molecular dynamics simulations of a dioleoylphosphatidylcholine bilayer at 66% RH (5.4 waters/lipid) were performed in the constant pressure and temperature ensemble using the united-atom GROMACS and the all-atom CHARMM22/27 force fields with the GROMACS and NAMD software packages, respectively. The quality of the simulated bilayer structures was evaluated by comparing simulation with experimental results for bilayer thickness, area/lipid, individual molecular-component distributions, continuous and discrete structure factors, and overall scattering-density profiles. Neither the GROMACS nor the CHARMM22/27 simulations reproduced experimental data within experimental error. The widths of the simulated terminal methyl distributions showed a particularly strong disagreement with the experimentally observed distributions. A comparison of the older CHARMM22 with the newer CHARMM27 force fields shows that significant progress is being made in the development of atomic force fields for describing lipid bilayer systems empirically.


Asunto(s)
Cristalografía/métodos , Membrana Dobles de Lípidos/química , Fluidez de la Membrana , Modelos Químicos , Modelos Moleculares , Fosfatidilcolinas/química , Simulación por Computador , Elasticidad , Conformación Molecular , Programas Informáticos , Estrés Mecánico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...