RESUMO
With the wealth of experimental physicochemical data available to chemoinformaticians from the literature, commercial, and company databases an increasing challenge is the interpretation of such datasets. Subtle differences in experimental methodology used to generate these datasets can give rise to variations in physicochemical property values. Such methodology nuances will be apparent to an expert experimentalist but not necessarily to the data analyst and modeller. This paper describes the differences between common methodologies for measuring the four most important physicochemical properties namely aqueous solubility, octan-1-ol/water distribution coefficient, pK(a) and plasma protein binding highlighting key factors that can lead to systematic differences. Insight is given into how to identify datasets suitable for combining.