RESUMO
Medicinal chemistry and drug design efforts can be assisted by machine learning (ML) models that relate the molecular structure to compound properties. Such quantitative structure-property relationship models are generally trained on large data sets that include diverse chemical series (global models). In the pharmaceutical industry, these ML global models are available across discovery projects as an "out-of-the-box" solution to assist in drug design, synthesis prioritization, and experiment selection. However, drug discovery projects typically focus on confined parts of the chemical space (e.g., chemical series), where global models might not be applicable. Local ML models are sometimes generated to focus on specific projects or series. Herein, ML-based global models, local models, and hybrid global-local strategies were benchmarked. Analyses were done for more than 300 drug discovery projects at Novartis and ten absorption, distribution, metabolism, and excretion (ADME) assays. In this work, hybrid global-local strategies based on transfer learning approaches were proposed to leverage both historical ADME data (global) and project-specific data (local) to adapt model predictions. Fine-tuning a pretrained global ML model (used for weights' initialization, WI) was the top-performing method. Average improvements of mean absolute errors across all assays were 16% and 27% compared with global and local models, respectively. Interestingly, when the effect of training set size was analyzed, WI fine-tuning was found to be successful even in low-data scenarios (e.g., â¼10 molecules per project). Taken together, this work highlights the potential of domain adaptation in the field of molecular property predictions to refine existing pretrained models on a new compound data distribution.
Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Desenho de Fármacos , Aprendizado de Máquina , Relação Quantitativa Estrutura-AtividadeRESUMO
Cell-cell adhesion plays a vital role in the development and maintenance of multicellular organisms. One of its functions is regulation of cell migration, such as occurs, e.g. during embryogenesis or in cancer. In this work, we develop a versatile multiscale approach to modelling a moving self-adhesive cell population that combines a careful microscopic description of a deterministic adhesion-driven motion component with an efficient mesoscopic representation of a stochastic velocity-jump process. This approach gives rise to mesoscopic models in the form of kinetic transport equations featuring multiple non-localities. Subsequent parabolic and hyperbolic scalings produce general classes of equations with non-local adhesion and myopic diffusion, a special case being the classical macroscopic model proposed in Armstrong et al. (J Theoret Biol 243(1): 98-113, 2006). Our simulations show how the combination of the two motion effects can unfold. Cell-cell adhesion relies on the subcellular cell adhesion molecule binding. Our approach lends itself conveniently to capturing this microscopic effect. On the macroscale, this results in an additional non-linear integral equation of a novel type that is coupled to the cell density equation.
Assuntos
Desenvolvimento Embrionário , Adesão Celular , Movimento Celular , Difusão , CinéticaRESUMO
Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure-property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Descoberta de Drogas/métodos , Algoritmos , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Preparações Farmacêuticas , FarmacocinéticaRESUMO
Accurate blood glucose (BG) forecasting is key in diabetes management, as it allows preventive actions to mitigate harmful hypoglycemic/hyperglycemic episodes. Considering the encouraging results obtained by seasonal stochastic models in proof-of-concept studies, this work assesses the methodology in two datasets (open-loop and closed-loop) recorded in free-living conditions. First, similar postprandial glycemic profiles are grouped together with fuzzy C-means clustering. Then, a seasonal stochastic model is identified for each cluster. Finally, real-time BG forecasting is performed by weighting each model's prediction. The proposed methodology (named C-SARIMA) is compared to other linear and nonlinear black-box methods: autoregressive integrated moving average (ARIMA), its variant with input (ARIMAX), a feed-forward neural network (NN), and its modified version (NN-X) fed by BG, insulin, and carbohydrates (timing and dosing) information for several prediction horizons (PHs). In the open-loop dataset, C-SARIMA grants a median root-mean-squared error (RMSE) of 20.13 mg/dL (PH = 30) and 27.23 mg/dL (PH = 45), not significantly different from ARIMA and NN. Over a longer PH, C-SARIMA achieves an RMSE = 31.96 mg/dL (PH = 60) and RMSE = 33.91 mg/dL (PH = 75), significantly outperforming the ARIMA and NN, without significant differences from the ARIMAX for PH ≥ 45 and the NN-X for PH ≥ 60. Similar results hold on the closed-loop dataset: for PH = 30 and 45 min, the C-SARIMA achieves an RMSE = 21.63 mg/dL and RMSE = 29.67 mg/dL, not significantly different from the ARIMA and NN. On longer PH, the C-SARIMA outperforms the ARIMA for PH > 45 and the NN for PH > 60 without significant differences from the ARIMAX for PH ≥ 45. Although using less input information, the C-SARIMA achieves similar performance to other prediction methods such as the ARIMAX and NN-X and outperforming the CGM-only approaches on PH > 45min.
Assuntos
Glucose , Hipoglicemia , Humanos , Condições Sociais , Estações do Ano , Refeições , GlicemiaRESUMO
The article deals with a computer-supported design of optimal and robust proportional-integral-derivative controllers with two degrees of freedom (2DoF PID) for a double integrator plus dead-time (DIPDT) process model. The particular design steps are discussed in terms of intelligent use of all available information extracted from a database of control tracking and disturbance rejection step responses, assessed by means of speed and shape-related performance measures of the process input and output signals, and denoted as a performance portrait (PP). In the first step, the performance portrait method (PPM) is used as a verifier, for whether the pilot analytical design of the parallel 2DoF PID controller did not omit practically interesting settings and shows that the optimality analysis can easily be extended to the series 2DoF PID controller. This is important as an explicit observer of equivalent input disturbances based on steady-state input values of ultra-local DIPDT models, while the parallel PID controller, allowing faster transient responses, needs an additional low-pass filter when reconstructed equivalent disturbances are required. Next, the design efficiency and conciseness in analyzing the effects of different loop parameters on changing the optimal processes are illustrated by an iterative use of PPM, enabled by the visualization of the dependence between the closed-loop performance and the shapes of the control signals. The main contributions of the paper are the introduction of PPM as an intelligent method for controller tuning that mimics an expert with sufficient experience to select the most appropriate solution based on a database of known solutions. In doing so, the analysis in this paper reveals new, previously undiscovered dimensions of PID control design.
RESUMO
Clinically, Taylor spatial frame (TSF) is usually used to correct femoral deformity. The first step in correction is to analyze skeletal deformities and measure the center of rotation of angulation (CORA). Since the above work needs to be done manually, the doctor's workload is heavy. Therefore, an automatic femoral deformity analysis system was proposed. Firstly, the Hough forest and constrained local models were trained on the femur image set. Then, the position and size of the femur in the X-ray image were detected by the trained Hough forest. Furthermore, the position and size were served as the initial values of the trained constrained local models to fit the femoral contour. Finally, the anatomical axis line of the proximal femur and the anatomical axis line of the distal femur could be drawn according to the fitting results. According to these lines, CORA can be found. Compared with manual measurement by doctors, the average error of the hip joint orientation line was 1.7°, the standard deviation was 1.75, the average error of the anatomic axis line of the proximal femur was 2.9°, and the standard deviation was 3.57. The automatic femoral deformity analysis system meets the accuracy requirements of orthopedics and can significantly reduce the workload of doctors.
Assuntos
Fêmur , Articulação do Quadril , Fêmur/diagnóstico por imagem , Florestas , HumanosRESUMO
With an ever-increasing number of synthetic chemicals being manufactured, it is unrealistic to expect that they will all be subjected to comprehensive and effective risk assessment. A shift from conventional animal testing to computer-aided methods is therefore an important step towards advancing the environmental risk assessments of chemicals. The aims of this study are two-fold: firstly, it examines the relationships between structural and physicochemical features of a diverse set of organic chemicals, and their acute aquatic toxicity towards Daphnia magna and Oryzias latipes using a classification tree approach. Secondly, it compares the efficiency and accuracy of the predictions of two modeling schemes: local models that are inherently restricted to a smaller subset of structurally-related substances, and a global model that covers a wider chemical space and a number of modes of toxic action. The classification tree-based models differentiate the organic chemicals into either 'highly toxic' or 'low to non-toxic' classes, based on internal and external validation criteria. These mechanistically-driven models, which demonstrate good performance, reveal that the key factors driving acute aquatic toxicity are lipophilicity, electrophilic reactivity, molecular polarizability and size. A comparative analysis of the performance of the two modeling schemes indicates that the local models, trained on homogeneous data sets, are less error prone, and therefore superior to the global model. Although the global models showed worse performance metrics compared to the local ones, their applicability domain is much wider, thereby significantly increasing their usefulness in practical applications for regulatory purposes. This demonstrates their advantage over local models and shows they are an invaluable tool for modeling heterogeneous chemical data sets.
Assuntos
Testes de Toxicidade/métodos , Poluentes Químicos da Água/toxicidade , Animais , Daphnia/efeitos dos fármacos , Compostos Orgânicos/toxicidade , Relação Quantitativa Estrutura-Atividade , Medição de RiscoRESUMO
Accurate glucose prediction along a long-enough time horizon is a key component for technology to improve type 1 diabetes treatment. Subjects with diabetes might benefit from supervision and control systems that accurately predict risks and trigger corrective actions early enough with improved mitigation. However, large intra-patient variability poses big challenges to glucose prediction. In previous works by the authors, clustering and local modeling techniques with seasonal stochastic models proved to be efficient, allowing for good glucose prediction accuracy for long prediction horizons. Continuous glucose monitoring (CGM) data were partitioned into fixed-length postprandial time subseries and clustered with Fuzzy C-Means to collect similar behaviors, enforcing seasonality at each cluster after subseries concatenation. Then, seasonal stochastic models were identified for each cluster and local predictions were integrated into a global prediction. However, free-living conditions do not support the fixed-length partition of CGM data since daily events duration is variable. In this work, a new algorithm is provided to overcome this constraint, allowing better coping with patient's variability under variable-length time-stamped daily events in supervision and control applications. Besides predicted glucose, two real-time indices are additionally provided-a crispness index, indicating good representation of current glucose behavior by a single model, and a normality index, allowing for the detection of an abnormal glucose behavior (unusual according to registered historical data). The framework is tested in a proof-of-concept in silico study with ten patients over four month training data and two independent two month validation datasets, with and without abnormal behaviors, from the distribution version of the UVA/Padova simulator extended with diverse sources of intra-patient variability.
RESUMO
The development and progression of numerous complex human diseases have been confirmed to be associated with microRNAs (miRNAs) by various experimental and clinical studies. Predicting potential miRNA-disease associations can help us understand the underlying molecular and cellular mechanisms of diseases and promote the development of disease treatment and diagnosis. Due to the high cost of conventional experimental verification, proposing a new computational method for miRNA-disease association prediction is an efficient and economical way. Since previous computational models ignored the hubness phenomenon, we presented a novel computational model of Bipartite Local models and Hubness-Aware Regression for MiRNA-Disease Association prediction (BLHARMDA). In this method, we first used known miRNA-disease associations to calculate the Jaccard similarity between miRNAs and between diseases, then utilized a modified kNNs model in the bipartite local model method. As a result, we effectively alleviated the detriments from 'bad' hubs. BLHARMDA obtained AUCs of 0.9141 and 0.8390 in the global and local leave-one-out cross validation, respectively, which outperformed most of the previous models and proved high prediction performance of BLHARMDA. Besides, the standard deviation of 0.0006 in 5-fold cross validation confirmed our model's prediction stability and the averaged prediction accuracy of 0.9120 showed the high precision of our model. In addition, to further evaluate our model's accuracy, we implemented BLHARMDA on three typical human diseases in three different types of case studies. As a result, 49 (Esophageal Neoplasms), 50 (Lung Neoplasms) and 50 (Carcinoma Hepatocellular) out of the top 50 related miRNAs were validated by recent experimental discoveries.
Assuntos
Predisposição Genética para Doença , MicroRNAs/genética , Modelos Genéticos , Neoplasias/genética , Carcinoma Hepatocelular/genética , Biologia Computacional/métodos , Neoplasias Esofágicas/genética , Humanos , Neoplasias Hepáticas/genética , Neoplasias Pulmonares/genética , Regressão PsicológicaRESUMO
Comprehensive two-dimensional (2D) gas chromatography (GC×GC) coupled to mass spectrometry (MS, GC×GC-MS), which enhances selectivity compared to GC-MS analysis, can be used for non-directed analysis (non-target screening) of environmental samples. Additional tools that aid in identifying unknown compounds are needed to handle the large amount of data generated. These tools include retention indices for characterizing relative retention of compounds and prediction of such. In this study, two quantitative structure-retention relationship (QSRR) approaches for prediction of retention times (1tR and 2tR) and indices (linear retention indices (LRIs) and a new polyethylene glycol-based retention index (PEG-2I)) in GC × GC were explored, and their predictive power compared. In the first method, molecular descriptors combined with partial least squares (PLS) analysis were used to predict times and indices. In the second method, the commercial software package ChromGenius (ACD/Labs), based on a "federation of local models," was employed. Overall, the PLS approach exhibited better accuracy than the ChromGenius approach. Although average errors for the LRI prediction via ChromGenius were slightly lower, PLS was superior in all other cases. The average deviations between the predicted and the experimental value were 5% and 3% for the 1tR and LRI, and 5% and 12% for the 2tR and PEG-2I, respectively. These results are comparable to or better than those reported in previous studies. Finally, the developed model was successfully applied to an independent dataset and led to the discovery of 12 wrongly assigned compounds. The results of the present work represent the first-ever prediction of the PEG-2I. Graphical abstract á .
RESUMO
Cellular adhesion provides one of the fundamental forms of biological interaction between cells and their surroundings, yet the continuum modelling of cellular adhesion has remained mathematically challenging. In 2006, Armstrong et al. proposed a mathematical model in the form of an integro-partial differential equation. Although successful in applications, a derivation from an underlying stochastic random walk has remained elusive. In this work we develop a framework by which non-local models can be derived from a space-jump process. We show how the notions of motility and a cell polarization vector can be naturally included. With this derivation we are able to include microscopic biological properties into the model. We show that particular choices yield the original Armstrong model, while others lead to more general models, including a doubly non-local adhesion model and non-local chemotaxis models. Finally, we use random walk simulations to confirm that the corresponding continuum model represents the mean field behaviour of the stochastic random walk.
Assuntos
Adesão Celular/fisiologia , Quimiotaxia/fisiologia , Modelos Biológicos , Animais , Fenômenos Biomecânicos , Moléculas de Adesão Celular/fisiologia , Movimento Celular/fisiologia , Polaridade Celular/fisiologia , Biologia Computacional , Simulação por Computador , Humanos , Conceitos Matemáticos , Processos EstocásticosRESUMO
Large datasets from -omics studies need to be deeply investigated. The aim of this paper is to provide a new method (LEM method) for the search of transcriptome and metabolome connections. The heuristic algorithm here described extends the classical canonical correlation analysis (CCA) to a high number of variables (without regularization) and combines well-conditioning and fast-computing in "R." Reduced CCA models are summarized in PageRank matrices, the product of which gives a stochastic matrix that resumes the self-avoiding walk covered by the algorithm. Then, a homogeneous Markov process applied to this stochastic matrix converges the probabilities of interconnection between genes, providing a selection of disjointed subsets of genes. This is an alternative to regularized generalized CCA for the determination of blocks within the structure matrix. Each gene subset is thus linked to the whole metabolic or clinical dataset that represents the biological phenotype of interest. Moreover, this selection process reaches the aim of biologists who often need small sets of genes for further validation or extended phenotyping. The algorithm is shown to work efficiently on three published datasets, resulting in meaningfully broadened gene networks.
Assuntos
Algoritmos , Nutrigenômica/métodos , Humanos , Modelos EstatísticosRESUMO
The movement of cells during (normal and abnormal) wound healing is the result of biomechanical interactions that combine cell responses with growth factors as well as cell-cell and cell-matrix interactions (adhesion and remodelling). It is known that cells can communicate and interact locally and non-locally with other cells inside the tissues through mechanical forces that act locally and at a distance, as well as through long non-conventional cell protrusions. In this study, we consider a non-local partial differential equation model for the interactions between fibroblasts, macrophages and the extracellular matrix (ECM) via a growth factor (TGF-$ \beta $) in the context of wound healing. For the non-local interactions, we consider two types of kernels (i.e., a Gaussian kernel and a cone-shaped kernel), two types of cell-ECM adhesion functions (i.e., adhesion only to higher-density ECM vs. adhesion to higher-/lower-density ECM) and two types of cell proliferation terms (i.e., with and without decay due to overcrowding). We investigate numerically the dynamics of this non-local model, as well as the dynamics of the localised versions of this model (i.e., those obtained when the cell perception radius decreases to 0). The results suggest the following: (â °) local models explain normal wound healing and non-local models could also explain abnormal wound healing (although the results are parameter-dependent); (â ±) the models can explain two types of wound healing, i.e., by primary intention, when the wound margins come together from the side, and by secondary intention when the wound heals from the bottom up.
Assuntos
Matriz Extracelular , Cicatrização , Cicatrização/fisiologia , Comunicação Celular , Fator de Crescimento Transformador beta/metabolismo , Proliferação de CélulasRESUMO
Practically the rotating machines degradation, such as gas turbines, is due to the quality of construction and online operation of their dynamic state models, of different physical phenomena affecting these machines which cause their total malfunction. To maintain their stable operation, it is essential to correctly describe these real dynamic behaviors by reliable and robust representations, by models that can be used in monitoring and diagnostics. To achieve the performance objectives in terms of security, reliability, availability, and operating safety, this work proposes the development of a fuzzy multi-model identification approach with states decoupled from the operating variables, uploaded for monitoring a TITAN 130 turbine. This fuzzy multi-model structure with decoupled states is of interest for the monitoring of industrial systems because it adapts to the different changes in dynamic behavior of the system, makes it possible to represent the nonlinear behavior of the real system in a linear multi-model form without loss of information. In this work, through the different implementations and obtained results, this approach clearly shows how the gas turbine dynamics were reproduced when using the proposed fuzzy multi-models, thus allowing better performance when exploiting it for the synthesis of the faults diagnosis strategy for this rotating machine.
RESUMO
In Reversed-Phase Liquid Chromatography, Quantitative Structure-Retention Relationship (QSRR) models for retention prediction of peptides can be built, starting from large sets of theoretical molecular descriptors. Good predictive QSRR models can be obtained after selecting the most informative descriptors. Reliable retention prediction may be an aid in the correct identification of proteins/peptides in proteomics and in chromatographic method development. Traditionally, global QSRR models are built, using a calibration set containing a representative range of analytes. In this study, a strategy is presented to build individual local Partial Least Squares (PLS) models for peptides, based on selected local calibration samples, most similar to the specific query peptide to be predicted. Similar local calibration peptides are selected from a possible calibration set. The calibration samples with the lowest Euclidian distances to the query peptide are considered as most similar. Two Euclidian distances are investigated as similarity parameter, (i) in the autoscaled descriptor space and, (ii) in the PLS factor space of the global calibration samples, both after variable selection by the Final Complexity Adapted Models (FCAM) method. The predictive abilities of individual local QSRR PLS models for peptides, developed with both Euclidian distances, are found significantly better than those of two global models, i.e. before and after FCAM variable selection. The predictive abilities of the local models, developed with distances calculated in the PLS factor space, were best.
Assuntos
Cromatografia de Fase Reversa , Peptídeos , Calibragem , Análise dos Mínimos Quadrados , Proteínas , Relação Quantitativa Estrutura-AtividadeRESUMO
We provide a review of recent advancements in non-local continuous models for migration, mainly from the perspective of its involvement in embryonal development and cancer invasion. Particular emphasis is placed on spatial non-locality occurring in advection terms, used to characterize a cell's motility bias according to its interactions with other cellular and acellular components in its vicinity (e.g. cell-cell and cell-tissue adhesions, non-local chemotaxis), but we also briefly address spatially non-local source terms. Following a short introduction and description of applications, we give a systematic classification of available PDE models with respect to the type of featured non-localities and review some of the mathematical challenges arising from such models, with a focus on analytical aspects. This article is part of the theme issue 'Multi-scale analysis and modelling of collective migration in biological systems'.
Assuntos
Movimento Celular , Modelos BiológicosRESUMO
The main aim of this study is to solve numerically the mathematical models showing cancer cell invasion of tissue with/without considering the effect of cell-cell and cell-matrix adhesion. The mathematical models studied here are the systems of time-dependent reaction-diffusion-taxis equations in one- and two-dimensional spaces, which are formulated in the local and non-local forms. There are some difficulties in finding their solutions via numerical methods. The main difficulty is to compute the non-local term appearing in one of the studied models, which causes more CPU time during simulations. The current paper aims to overcome this problem, where a new meshless method, namely generalized moving least squares (GMLS) approximation in space and a semi-implicit backward differential formula of first-order (SBDF1) in time have been applied. Based on GMLS theory, the non-local term is approximated without any difficulties. Moreover, a simple method based on the GMLS technique is presented to implement the boundary conditions. The obtained discrete scheme for both mathematical models is a linear system of algebraic equations per time step. The biconjugate gradient stabilized (BiCGSTAB) algorithm with zero-fill incomplete lower-upper (ILU) preconditioner is used to solve the obtained linear system at each time step. At the end of this paper, some simulation results are reported to show the behavior of cancer cell invasion in the local model, and the non-local model due to reduction of cell-cell adhesion and increasing cell-matrix adhesion in one- and two-dimensional spaces, where two different types of distribution points have been considered in the square domain. The computational algorithms of the GMLS approximation and the developed numerical method for solving the non-local (local) model are included in the Appendix.
Assuntos
Algoritmos , Simulação por Computador , Invasividade Neoplásica , Difusão , Humanos , Análise dos Mínimos QuadradosRESUMO
BACKGROUND: Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. RESULTS: We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. CONCLUSIONS: We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters.
RESUMO
Predicting forage feed value is a vital part of estimating ruminant performances. Most near-infrared (NIR) reflectance calibration models have been developed on oven-dried green forages, but preserved forages such as hays or silages are a significant part of real-world farm practice. Fresh and preserved forages give largely similar fodder, but drying or ensiling processes could modify preserved forage spectra which would make the oven-dried green forage model unsuitable to use on preserved forage samples. The aim of this study was to monitor the performance of oven-dried green forage calibration models on a set of hay and silage to predict their nutritive value. Local and global approaches were tested and 1025 green permanent grassland forages, 46 types of hay, and 27 types of silage were used. The samples were scanned by NIR spectroscopy and analyzed for nitrogen, neutral detergent fiber, acid detergent fiber, and pepsin-cellulase dry matter digestibility (PCDMD). Local and global calibrations were developed on 975 oven-dried green forage spectra and tested on 50 samples of oven-dried green forages, 46 samples of hay, and 27 samples of silage. For oven-dried green forage and hay validation sets, Mahalanobis distance (H) between these samples and the calibration population center was lower than 3. No significant standard error of prediction differences was obtained when calibration models were applied to oven-dried green forage and hay validation sets. For silage, the H-distance was higher than 3, meaning that calibration models built from oven-dried green forages cannot be applied to silage samples. We conclude that local calibration outperforms global strategy on predicting the PCDMD of oven-dried green forages and hay.
Assuntos
Ração Animal/análise , Silagem/análise , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Espectroscopia de Luz Próxima ao Infravermelho/normas , Animais , Calibragem , Valor Nutritivo , Poaceae/química , Reprodutibilidade dos Testes , RuminantesRESUMO
32 Quantitative Structure-Property Relationship (QSPR) models were constructed for prediction of aqueous intrinsic solubility of liquid and crystalline chemicals. Data sets contained 1022 liquid and 2615 crystalline compounds. Multiple Linear Regression (MLR), Support Vector Machine (SVM) and Random Forest (RF) methods were used to construct global models, and k-nearest neighbour (kNN), Arithmetic Mean Property (AMP) and Local Regression Property (LoReP) were used to construct local models. A set of the best QSPR models was obtained: for liquid chemicals with RMSE (root mean square error) of prediction in the range 0.50-0.60 log unit; for crystalline chemicals 0.80-0.90 log unit. In the case of global models the large number of descriptors makes mechanistic interpretation difficult. The local models use only one or two descriptors, so that a medicinal chemist working with sets of structurally-related chemicals can readily estimate their solubility. However, construction of stable local models requires the presence of closely related neighbours for each chemical considered. It is probable that a consensus of global and local QSPR models will be the optimal approach for construction of stable predictive QSPR models with mechanistic interpretation.