RESUMO
Reversed-phase (RP) liquid chromatography is an important tool for the characterization of materials and products in the pharmaceutical industry. Method development is still challenging in this application space, particularly when dealing with closely-related compounds. Models of chromatographic selectivity are useful for predicting which columns out of the hundreds that are available are likely to have very similar, or different, selectivity for the application at hand. The hydrophobic subtraction model (HSM1) has been widely employed for this purpose; the column database for this model currently stands at 750 columns. In previous work we explored a refinement of the original HSM1 (HSM2) and found that increasing the size of the dataset used to train the model dramatically reduced the number of gross errors in predictions of selectivity made using the model. In this paper we describe further work in this direction (HSM3), this time based on a much larger solute set (1014 solute/stationary phase combinations) containing selectivities for compounds covering a broader range of physicochemical properties compared to HSM1. The molecular weight range was doubled, and the range of the logarithm of the octanol/water partition coefficients was increased slightly. The number of active pharmaceutical ingredients and related synthetic intermediates and impurities was increased from four to 28, and ten pairs of closely related structures (e.g., geometric and cis-/trans- isomers) were included. The HSM3 model is based on retention measurements for 75 compounds using 13 RP stationary phases and a mobile phase of 40/60 acetonitrile/25 mM ammonium formate buffer at pH 3.2. This data-driven model produced predictions of ln α (chromatographic selectivity using ethylbenzene as the reference compound) with average absolute errors of approximately 0.033, which corresponds to errors in α of about 3 %. In some cases, the prediction of the trans-/cis- selectivities for positional and geometric isomers was relatively accurate, and the driving forces for the observed selectivity could be inferred by examination of the relative magnitudes of the terms in the HSM3 model. For some geometric isomer pairs the interactions mainly responsible for the observed selectivities could not be rationalized due to large uncertainties for particular terms in the model. This suggests that more work is needed in the future to explore other HSM-type models and continue expanding the training dataset in order to continue improving the predictive accuracy of these models. Additionally, we release with this paper a much larger data set (43,329 total retention measurements) at multiple mobile phase compositions, to enable other researchers to pursue their own lines of inquiry related to RP selectivity.
Assuntos
Cromatografia de Fase Reversa , Interações Hidrofóbicas e Hidrofílicas , Cromatografia de Fase Reversa/métodos , Isomerismo , Preparações Farmacêuticas/química , Preparações Farmacêuticas/análise , Modelos Químicos , Peso Molecular , Água/químicaRESUMO
Method development in comprehensive two-dimensional liquid chromatography (LC×LC) is a challenging process. The interdependencies between the two dimensions and the possibility of incorporating complex gradient profiles, such as multi-segmented gradients or shifting gradients, make trial-and-error method development time-consuming and highly dependent on user experience. Retention modeling and Bayesian optimization (BO) have been proposed as solutions to mitigate these issues. However, both approaches have their strengths and weaknesses. On the one hand, retention modeling, which approximates true retention behavior, depends on effective peak tracking and accurate retention time and width predictions, which are increasingly challenging for complex samples and advanced gradient assemblies. On the other hand, Bayesian optimization may require many experiments when dealing with many adjustable parameters, as in LC×LC. Therefore, in this work, we investigate the use of multi-task Bayesian optimization (MTBO), a method that can combine information from both retention modeling and experimental measurements. The algorithm was first tested and compared with BO using a synthetic retention modeling test case, where it was shown that MTBO finds better optima with fewer method-development iterations than conventional BO. Next, the algorithm was tested on the optimization of a method for a pesticide sample and we found that the algorithm was able to improve upon the initial scanning experiments. Multi-task Bayesian optimization is a promising technique in situations where modeling retention is challenging, and the high number of adjustable parameters and/or limited optimization budget makes traditional Bayesian optimization impractical.
Assuntos
Algoritmos , Teorema de Bayes , Cromatografia Líquida/métodos , Praguicidas/isolamento & purificação , Praguicidas/análiseRESUMO
Hydroxypropyl methyl cellulose (HPMC) is a type of cellulose derivative with properties that render it useful in e.g. food, cosmetics, and pharmaceutical industry. The substitution degree and composition of the ß-glucose subunits of HPMC affect its physical and functional properties, but HPMC characterization is challenging due to its high structural heterogeneity, including many isomers. In this study, comprehensive two-dimensional liquid chromatography-mass spectrometry was used to examine substituted glucose monomers originating from complete acid hydrolysis of HPMC. Resolution between the different monomers was achieved using a C18 and cyano column in the first and second LC dimension, respectively. The data analysis process was structured to obtain fingerprints of the monomers of interest. The results revealed that isomers of the respective monomers could be selectively separated based on the position of substituents. The examination of two industrial HPMC products revealed differences in overall monomer composition. While both products contained monomers with a similar degree of substitution, they exhibited distinct regioselectivity.
Assuntos
Derivados da Hipromelose , Glucose/química , Glucose/análise , Hidrólise , Derivados da Hipromelose/química , Isomerismo , Espectrometria de Massa com Cromatografia LíquidaRESUMO
Reversed-phase liquid chromatography (RPLC) is the analytical tool of choice for monitoring process-related organic impurities and degradants in pharmaceutical materials. Its popularity is due to its general ease-of-use, high performance, and reproducibility in most cases, all of which have improved as the technique has matured over the past few decades. Nevertheless, in our work we still occasionally observe situations where RPLC methods are not as robust as we would like them to be in practice due to variations in stationary phase chemistry between manufactured batches (i.e., lot-to-lot variability), and changes in stationary phase chemistry over time. Over the last three decades several models of RPLC selectivity have been developed and used to quantify and rationalize the effects of numerous parameters (e.g., effect of bonded phase density) on separation selectivity. The Hydrophobic Subtraction Model (HSM) of RPLC selectivity has been used extensively for these purposes; currently the publicly available database of column parameters contains data for 750 columns. In this work we explored the possibility that the HSM could be used to better understand the chemical basis of observed differences in stationary phase selectivity when they occur - for example, lot-to-lot variations or changes in selectivity during column use. We focused our attention on differences and changes in the observed selectivity for a pair of cis-trans isomers of a pharmaceutical intermediate. Although this is admittedly a challenging case, we find that the observed changes in selectivity are not strongly correlated with HSM column parameters, suggesting that there is a gap in the information provided by the HSM with respect to cis-trans isomer selectivity specifically. Further work with additional probe molecules showed that larger changes in cis-trans isomer selectivity were observed for pairs of molecules with greater molecular complexity, compared to the selectivity changes observed for simpler molecules. These results do not provide definitive answers to questions about the chemical basis of changes in stationary phase chemistry that lead to observed differences in cis-trans isomer selectivity. However, the results do provide important insights about the critical importance of molecular complexity when choosing probe compounds and indicate opportunities to develop improved selectivity models with increased sensitivity for cis-trans isomer selectivity.
Assuntos
Cromatografia de Fase Reversa , Comércio , Reprodutibilidade dos Testes , Bases de Dados Factuais , Preparações FarmacêuticasRESUMO
Method development in comprehensive two-dimensional liquid chromatography (LC × LC) is a complicated endeavor. The dependency between the two dimensions and the possibility of incorporating complex gradient profiles, such as multi-segmented gradients or shifting gradients, renders method development by "trial-and-error" time-consuming and highly dependent on user experience. In this work, an open-source algorithm for the automated and interpretive method development of complex gradients in LC × LC-mass spectrometry (MS) was developed. A workflow was designed to operate within a closed-loop that allowed direct interaction between the LC × LC-MS system and a data-processing computer which ran in an unsupervised and automated fashion. Obtaining accurate retention models in LC × LC is difficult due to the challenges associated with the exact determination of retention times, curve fitting because of the use of gradient elution, and gradient deformation. Thus, retention models were compared in terms of repeatability of determination. Additionally, the design of shifting gradients in the second dimension and the prediction of peak widths were investigated. The algorithm was tested on separations of a tryptic digest of a monoclonal antibody using an objective function that included the sum of resolutions and analysis time as quality descriptors. The algorithm was able to improve the separation relative to a generic starting method using these complex gradient profiles after only four method-development iterations (i.e., sets of chromatographic conditions). Further iterations improved retention time and peak width predictions and thus the accuracy in the separations predicted by the algorithm.
Assuntos
Algoritmos , Anticorpos Monoclonais , Computadores , Espectrometria de Massas , Cromatografia LíquidaRESUMO
Many contemporary challenges in liquid chromatography-such as the need for "smarter" method development tools, and deeper understanding of chromatographic phenomena-could be addressed more efficiently and effectively with larger volumes of experimental retention data than are available. The paucity of publicly accessible, high-quality measurements needed for the development of retention models and simulation tools has largely been due to the high cost in time and resources associated with traditional retention measurement approaches. Recently we described an approach to improve the throughput of such measurements by using very short columns (typically 5 mm), while maintaining measurement accuracy. In this paper we present a perspective on the characteristics of a dataset containing about 13,000 retention measurements obtained using this approach, and describe a different sample introduction method that is better suited to this application than the approach we used in prior work. The dataset comprises results for 35 different small molecules, nine different stationary phases, and several mobile phase compositions for each analyte/phase combination. During the acquisition of these data, we have interspersed repeated measurements of a small number of compounds for quality control purposes. The data from these measurements not only enable detection of outliers but also assessment of the repeatability and reproducibility of retention measurements over time. For retention factors greater than 1, the mean relative standard deviation (RSD) of replicate (typically n=5) measurements is 0.4%, and the standard deviation of RSDs is 0.4%. Most differences between selectivity values measured six months apart for 15 non-ionogenic compounds were in the range of +/- 1%, indicating good reproducibility. A critically important observation from these analyses is that selectivity defined as retention of a given analyte relative to the retention of a reference compound (kx/kref) is a much more consistent measure of retention over a time span of months compared to the retention factor alone. While this work and dataset also highlight the importance of stationary phase stability over time for achieving reliable retention measurements, we are nevertheless optimistic that this approach will enable the compilation of large databases (>> 10,000 measurements) of retention values over long time periods (years), which can in turn be leveraged to address some of the most important contemporary challenges in liquid chromatography. All the data discussed in the manuscript are provided as Supplemental Information.
Assuntos
Reprodutibilidade dos Testes , Cromatografia Líquida/métodos , Indicadores e Reagentes , Simulação por Computador , Bases de Dados Factuais , Cromatografia Líquida de Alta Pressão/métodosRESUMO
Efforts to model and simulate various aspects of liquid chromatography (LC) separations (e.g., retention, selectivity, peak capacity, injection breakthrough) depend on experimental retention measurements to use as the basis for the models and simulations. Often these modeling and simulation efforts are limited by datasets that are too small because of the cost (time and money) associated with making the measurements. Other groups have demonstrated improvements in throughput of LC separations by focusing on "overhead" associated with the instrument itself - for example, between-analysis software processing time, and autosampler motions. In this paper we explore the possibility of using columns with small volumes (i.e., 5 mm x 2.1 mm i.d.) compared to conventional columns (e.g., 100 mm x 2.1 mm i.d.) that are typically used for retention measurements. We find that isocratic retention factors calculated for columns with these dimensions are different by about 20%; we attribute this difference - which we interpret as an error in measurements based on data from the 5 mm column - to extra-column volume associated with inlet and outlet frits. Since retention factor is a thermodynamic property of the mobile/stationary phase system under study, it should be independent of the dimensions of the column that is used for the measurement. We propose using ratios of retention factors (i.e., selectivities) to translate retention measurements between columns of different dimensions, so that measurements made using small columns can be used to make predictions for separations that involve conventional columns. We find that this approach reduces the difference in retention factors (5 mm compared to 100 mm columns) from an average of 18% to an average absolute difference of 1.7% (all errors less than 8%). This approach will significantly increase the rate at which high quality retention data can be collected to thousands of measurements per instrument per day, which in turn will likely have a profound impact on the quality of models and simulations that can be developed for many aspects of LC separations.
Assuntos
Software , Cromatografia Líquida de Alta Pressão/métodos , Cromatografia Líquida/métodos , Simulação por Computador , Indicadores e ReagentesRESUMO
A peak-tracking algorithm was developed for use in comprehensive two-dimensional liquid chromatography coupled to mass spectrometry. Chromatographic peaks were tracked across two different chromatograms, utilizing the available spectral information, the statistical moments of the peaks and the relative retention times in both dimensions. The algorithm consists of three branches. In the pre-processing branch, system peaks are removed based on mass spectra compared to low intensity regions and search windows are applied, relative to the retention times in each dimension, to reduce the required computational power by elimination unlikely pairs. In the comparison branch, similarity between the spectral information and statistical moments of peaks within the search windows is calculated. Lastly, in the evaluation branch extracted-ion-current chromatograms are utilized to assess the validity of the pairing results. The algorithm was applied to peptide retention data recorded under varying chromatographic conditions for use in retention modelling as part of method optimization tools. Moreover, the algorithm was applied to complex peptide mixtures obtained from enzymatic digestion of monoclonal antibodies. The algorithm yielded no false positives. However, due to limitations in the peak-detection algorithm, cross-pairing within the same peaks occurred and six trace compounds remained falsely unpaired.
Assuntos
Algoritmos , Anticorpos Monoclonais/análise , Cromatografia Líquida/métodos , Peptídeos/análise , Espectrometria de Massas/métodos , Reconhecimento Automatizado de Padrão , Padrões de ReferênciaRESUMO
The hydrophobic subtraction model (HSM) for characterizing the selectivity of reversed-phase liquid chromatography (LC) columns has been used extensively by the LC community since it was first developed in 2002. Continuing interest in the model is due in part to the large, publicly available set of column descriptors that has been assembled over the past 18 years. In the work described in this report, we sought to refine the HSM with the goal of improving the predictive accuracy of the model without compromising its physico-chemical interpretability. The approach taken here has the following facets. A set of retention measurements for 635 columns and the 16 probe solutes used to characterize new columns using the HSM was assembled. Principal components analysis (PCA) was used as a guide for the development of a refined version of the HSM. Several outlying columns (84) were eliminated from the analysis because they were either inconsistent with the PCA model or were outliers from the original HSM model. With the retention dataset for the 16 probe solutes on the remaining 551 columns, we determined that a six-component model is the most sophisticated form of the model that can be used without overfitting the data. In our refined version of the HSM, the S*σ term has been removed. Two new terms have been added, which more accurately account for the molecular volume of the solute (Vv), and the solute dipolarity (Dd), and the remaining terms have been adjusted to accommodate these changes. The refined model described here provides improved prediction of retention factors, with the model standard error being reduced from 1.0 for the original HSM to 0.35 for the refined model (16 solutes, 551 columns). Furthermore, the number of retention factors with errors greater than 10% are reduced from 231 to 25. A revised metric for column similarity, F, is also proposed as a part of this work.