RESUMO
The accurate quantification of tumor-infiltrating immune cells turns crucial to uncover their role in tumor immune escape, to determine patient prognosis and to predict response to immune checkpoint blockade. Current state-of-the-art methods that quantify immune cells from tumor biopsies using gene expression data apply computational deconvolution methods that present multicollinearity and estimation errors resulting in the overestimation or underestimation of the diversity of infiltrating immune cells and their quantity. To overcome such limitations, we developed MIXTURE, a new ν-support vector regression-based noise constrained recursive feature selection algorithm based on validated immune cell molecular signatures. MIXTURE provides increased robustness to cell type identification and proportion estimation, outperforms the current methods, and is available to the wider scientific community. We applied MIXTURE to transcriptomic data from tumor biopsies and found relevant novel associations between the components of the immune infiltrate and molecular subtypes, tumor driver biomarkers, tumor mutational burden, microsatellite instability, intratumor heterogeneity, cytolytic score, programmed cell death ligand 1 expression, patients' survival and response to anti-cytotoxic T-lymphocyte-associated antigen 4 and anti-programmed cell death protein 1 immunotherapy.
Assuntos
Bases de Dados de Ácidos Nucleicos , Regulação Neoplásica da Expressão Gênica/imunologia , Imunoterapia , Modelos Imunológicos , Neoplasias , Máquina de Vetores de Suporte , Transcriptoma/imunologia , Humanos , Neoplasias/genética , Neoplasias/imunologia , Neoplasias/terapiaRESUMO
Plant pathogens, such as fungi, bacteria, and viruses, can cause serious damage to crops and significantly reduce yield and quality. Bacterial diseases of agronomic crops, however, have been little studied. The present study aims to isolate and identify bacteria recovered from symptomatic maize (Zea mays) leaves collected from field samples in the province of Cordoba, Argentina. Bacterial strains were identified using whole-cell matrix-assisted laser-desorption-ionization-time-off light mass spectrometry and 16S rDNA sequencing. Members of the genera Exiguobacterium and Curtobacterium were dominant in the studied vegetal material. Two strains (RC18-1/2 and RC18-3/1) were selected for further studies. The pathogenicity test showed that plants inoculated with Curtobacterium sp. RC18-1/2 exhibited the same symptoms as those previously detected in the field. To our knowledge, this study provides the first evidence about the isolation of a Curtobacterium pathogenic strain in maize. Effective crop disease management will require the use of integrated strategies, such as resistant cultivars and/or biocontrol agents.
Assuntos
Actinomycetales , Zea mays , Actinomycetales/genética , Argentina , Bactérias , DNA Ribossômico/genética , Fungos/genética , Plantas , Zea mays/microbiologiaRESUMO
Rural land valuation plays an important role in the development of land use policies for agricultural purposes. The advance of computational software and machine learning methods has enhanced mass appraisal methodologies for modeling and predicting economic values. New machine learning methods, like tree-based regression models, have been proposed as an alternative to linear regression to predict economic values from ancillary variables, since these algorithms are able to handle non-normality and non-linearity in the data. However, regression trees are commonly estimated assuming independent rather than spatially correlated data. This study aims to build a tree-based regression model that will help to tackle methodological problems related to the determination of prices of rural lands. The Quantile Regression Forest (QRF) algorithm was used to provide a regression model to predict and assess the uncertainty associated with model-derived predictions. However, the classical QRF ignores the autocorrelation underlying spatialized land values. The objective of this work was to develop, implement, and evaluate a spatial version of QRF, named sQRF, for computer-assisted mass appraisal of rural land values accounting for information from neighboring sites. We compared predictions of land values from sQRF with those obtained from spatial random forest, kriging regression, and linear regression models. sQRF performed well in predicting rural land values; indeed, it performed better than multiple linear regression. An important feature of sQRF is its ability to produce a direct uncertainty measure to assess the goodness of the predictions. Land values reï¬ect a complex mix of agricultural returns, localization, and access to markets, which can be predicted from ancillary environmental variables. Good predictive models are essential to determine land values for multiple purposes including territorial taxation.
Assuntos
Monitoramento Ambiental , Aprendizado de Máquina , Algoritmos , Modelos Lineares , Análise EspacialRESUMO
The aim of this work was to fit and compare three non-linear models (Wood, Milkbot and diphasic) to model lactation curves from two approaches: with and without cow random effect. Knowing the behaviour of lactation curves is critical for decision-making in a dairy farm. Knowledge of the model of milk production progress along each lactation is necessary not only at the mean population level (dairy farm), but also at individual level (cow-lactation). The fits were made in a group of high production and reproduction dairy farms; in first and third lactations in cool seasons. A total of 2167 complete lactations were involved, of which 984 were first-lactations and the remaining ones, third lactations (19 382 milk yield tests). PROC NLMIXED in SAS was used to make the fits and estimate the model parameters. The diphasic model resulted to be computationally complex and barely practical. Regarding the classical Wood and MilkBot models, although the information criteria suggest the selection of MilkBot, the differences in the estimation of production indicators did not show a significant improvement. The Wood model was found to be a good option for fitting the expected value of lactation curves. Furthermore, the three models fitted better when the subject (cow) random effect was considered, which is related to magnitude of production. The random effect improved the predictive potential of the models, but it did not have a significant effect on the production indicators derived from the lactation curves, such as milk yield and days in milk to peak.
Assuntos
Bovinos/fisiologia , Lactação/fisiologia , Dinâmica não Linear , Animais , Indústria de Laticínios/métodos , Feminino , Modelos Biológicos , Paridade , Estações do AnoRESUMO
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Assuntos
Algoritmos , Genética Populacional/métodos , Modelos Genéticos , Sondas Moleculares/genética , Alelos , Análise por Conglomerados , Simulação por Computador , Genótipo , Polimorfismo de Nucleotídeo Único , Zea mays/genéticaRESUMO
Several recreational surface waters in Salta, Argentina, were selected to assess their quality. Seventy percent of the measurements exceeded at least one of the limits established by international legislation becoming unsuitable for their use. To interpret results of complex data, multivariate techniques were applied. Arenales River, due to the variability observed in the data, was divided in two: upstream and downstream representing low and high pollution sites, respectively, and cluster analysis supported that differentiation. Arenales River downstream and Campo Alegre Reservoir were the most different environments, and Vaqueros and La Caldera rivers were the most similar. Canonical correlation analysis allowed exploration of correlations between physicochemical and microbiological variables except in both parts of Arenales River, and principal component analysis allowed finding relationships among the nine measured variables in all aquatic environments. Variable's loadings showed that Arenales River downstream was impacted by industrial and domestic activities, Arenales River upstream was affected by agricultural activities, Campo Alegre Reservoir was disturbed by anthropogenic and ecological effects, and La Caldera and Vaqueros rivers were influenced by recreational activities. Discriminant analysis allowed identification of subgroup of variables responsible for seasonal and spatial variations. Enterococcus, dissolved oxygen, conductivity, E. coli, pH, and fecal coliforms are sufficient to spatially describe the quality of the aquatic environments. Regarding seasonal variations, dissolved oxygen, conductivity, fecal coliforms, and pH can be used to describe water quality during dry season, while dissolved oxygen, conductivity, total coliforms, E. coli, and Enterococcus during wet season. Thus, the use of multivariate techniques allowed optimizing monitoring tasks and minimizing costs involved.
Assuntos
Monitoramento Ambiental/métodos , Recreação , Poluentes da Água/análise , Agricultura , Argentina , Análise Discriminante , Escherichia coli , Oxigênio/análise , Análise de Componente Principal , Rios , Estações do Ano , Água/análise , Qualidade da ÁguaRESUMO
In Pinus pinea, cone to pine nut yield (total pine nut weight expressed as percentage of cone weight), an important crop trait, is decreasing worldwide. This phenomenon is of great concern, since the nuts of this species are highly demanded. Cone weight, seed and pine nut morphometry, and pine nut yield were monitored in a non-native area in Chile for 10 years. For this purpose, 560 cones, and the seeds and pine nuts contained in them, were counted, measured and weighed in a multi-environment study involving seven plantations. Seed and pine nut damage was evaluated. Two contrasting categories of cone weight (heavy/light) were defined. Cone to pine nut yield (PY) and other traits were calculated and compared between categories using a mixed linear model. Regression trees were used to explain PY variability. Cone weight was higher than in the species' native range (474 g vs 300 g on average). Pine nut number per cone and PY were significantly higher in the heavy cone category than in the light cone category (125 vs 89 units, and 4.05 vs 3.62%, respectively), The percentage of damaged seeds was lower in heavy than in light cones (9.0% vs 15.9%). Thus, PY depended on seed and pine nut morphometry as well as on seed health. Management practices, such as fertilization and irrigation, could be used to boost production of heavy cones and consequently increase PY.
Assuntos
Nozes , Pinus , Sementes , Chile , Modelos LinearesRESUMO
A data set of clinical studies of electroencephalogram recordings (EEG) following data acquisition protocols in control individuals (Eyes Closed Wakefulness - Eyes Open Wakefulness, Hyperventilation, and Optostimulation) are quantified with information theory metrics, namely permutation Shanon entropy and permutation Lempel Ziv complexity, to identify functional changes. This work implement Linear mixed-effects models (LMEMs) for confirmatory hypothesis testing. The results show that EEGs have high variability for both metrics and there is a positive correlation between them. The mean of permutation Lempel-Ziv complexity and permutation Shanon entropy used simultaneously for each of the four states are distinguishable from each other. However, used separately, the differences between permutation Lempel-Ziv complexity or permutation Shanon entropy of some states were not statistically significant. This shows that the joint use of both metrics provides more information than the separate use of each of them. Despite their wide use in medicine, LMEMs have not been commonly applied to simultaneously model metrics that quantify EEG signals. Modeling EEGs using a model that characterizes more than one response variable and their possible correlations represents a new way of analyzing EEG data in neuroscience.
RESUMO
Over the last 20 years, begomoviruses have emerged as devastating pathogens, limiting the production of different crops worldwide. Weather conditions increase vector populations, with negative effects on crop production. In this work we evaluate the relationship between the incidence of begomovirus and weather before and during the crop cycle. Soybean and bean fields from north-western (NW) Argentina were monitored between 2001 and 2018 and classified as moderate (≤50%) or severe (>50%) according to the begomovirus incidence. Bean golden mosaic virus (BGMV) and soybean blistering mosaic virus (SbBMV) were the predominant begomovirus in bean and soybean crops, respectively. Nearly 200 bio-meteorological variables were constructed by summarizing climatic variables in 10-day periods from July to November of each crop year. The studied variables included temperature, precipitation, relative humidity, wind (speed and direction), pressure, cloudiness, and visibility. For bean, high maximum winter temperatures, low spring humidity, and precipitation 10 days before planting correlated with severe incidence. In soybeans, high temperatures in late winter and in the pre-sowing period, and low spring precipitations were found to be good predictors of high incidence of begomovirus. The results suggest that temperature and pre-sowing precipitations can be used to predict the incidence status [predictive accuracy: 80% (bean) and 75% (soybean)]. Thus, these variables can be incorporated in early warning systems for crop management decision-making to reduce the virus impact on bean and soybean crops.
Assuntos
Begomovirus , Glycine max , Begomovirus/genética , Argentina/epidemiologia , Incidência , Tempo (Meteorologia) , Produtos AgrícolasRESUMO
Regional mapping herbicide sorption to soil is essential for risk assessment. However, conducting analytical quantification of adsorption coefficient (Kd ) in large-scale studies is too costly; therefore, a research question arises on goodness of Kd spatial prediction from sampling. The application of a spatial Bayesian regression (BR) is a newer technique in agricultural and natural resources sciences that allows converting spatially discrete samples into maps covering continuous spatial domains. The objective of this work was to unveil herbicide sorption to soil at a landscape scale by developing a predictive BR model. We integrated a large set of ancillary soil and climate covariables from sites with Kd measurements into a spatial mixed model including site random effects. The models were fitted using glyphosate and atrazine Kd s, determined in 80 and 120 sites, respectively, from central Argentina. For model assessment, measurements of global and point-wise prediction errors were obtained by cross-validation; residual variability was estimated by bootstrap to compare BR with regression kriging. Results showed that the BR spatial predictions outperformed regression kriging. The glyphosate Kd model (root mean square prediction error, 13% of the mean) included aluminum oxides, pH, and clay content, whereas the atrazine Kd model strongly depended on soil organic carbon and clay and on climatic variables related to water availability (root mean square prediction error, 27%). Spatial modeling of a complex edaphic process as herbicide sorption to soils enhanced environmental interpretations. An efficient approach for spatial mapping provides a modern perspective on the study of herbicide sorption to soil.
Assuntos
Atrazina , Herbicidas , Poluentes do Solo , Adsorção , Teorema de Bayes , Carbono , Herbicidas/análise , Solo , Poluentes do Solo/análiseRESUMO
MOTIVATION: Difference in-gel electrophoresis (DIGE)-based protein expression analysis allows assessing the relative expression of proteins in two biological samples differently labeled (Cy5, Cy3 CyDyes). In the same gel, a reference sample is also used (Cy2 CyDye) for spot matching during image analysis and volume normalization. The standard statistical techniques to identify differentially expressed (DE) proteins are the calculation of fold-changes and the comparison of treatment means by the t-test. The analyses rarely accounts for other experimental effects, such as CyDye and gel effects, which could be important sources of noise while detecting treatment effects. RESULTS: We propose to identify DIGE DE proteins using a two-stage linear mixed model. The proposal consists of splitting the overall model for the measured intensity into two interconnected models. First, we fit a normalization model that accounts for the general experimental effects, such as gel and CyDye effects as well as for the features of the associated random term distributions. Second, we fit a model that uses the residuals from the first step to account for differences between treatments in protein-by-protein basis. The modeling strategy was evaluated using data from a melanoma cell study. We found that a heteroskedastic model in the first stage, which also account for CyDye and gel effects, best normalized the data, while allowing for an efficient estimation of the treatment effects. The Cy2 reference channel was used as a covariate in the normalization model to avoid skewness of the residual distribution. Its inclusion improved the detection of DE proteins in the second stage.
Assuntos
Eletroforese em Gel Bidimensional/métodos , Melanoma/metabolismo , Proteoma/metabolismo , Carbocianinas/química , Linhagem Celular Tumoral , Biologia Computacional/métodos , Eletroforese em Gel Bidimensional/instrumentação , Corantes Fluorescentes/química , Humanos , Processamento de Imagem Assistida por Computador/métodos , Modelos Lineares , Proteômica/métodosRESUMO
This article presents original geospatial data on soil adsorption coefficient (Kd) for two widely used herbicides in agriculture, glyphosate and atrazine. Besides Kds, the dataset includes site-specific soil data: pH, total nitrogen, total organic carbon, Na, K, Ca, Mg, Zn, Mn, Cu, cation exchange capacity, percentage of sand, silt and clay, water holding capacity, aluminum and iron oxides, as well as climatic and topographic variables. The quantification of herbicides soil retention was made on a sample of soils selected by Conditionated Latin Hypercube method to capture the underlying edaphoclimatic variability in Cordoba, Argentina. The glyphosate data presented here has been used to evaluate statistical methods for model-based digital mapping (F. Giannini Kurina, S. Hang, R. Macchiavelli, M. Balzarini, 2019) [1]. The dataset is made publicly available to enable future analyzes on processes that leads the dynamics of both herbicides in soil.
RESUMO
Smut disease caused by the fungal pathogen Thecaphora frezii Carranza & Lindquist is threatening the peanut production in Argentina. Fungicides commonly used in the peanut crop have shown little or no effect controlling the disease, making it a priority to obtain peanut varieties resistant to smut. In this study, recombinant inbred lines (RILs) were developed from three crosses between three susceptible peanut elite cultivars (Arachis hypogaea L. subsp. hypogaea) and two resistant landraces (Arachis hypogaea L. subsp. fastigiata Waldron). Parents and RILs were evaluated under high inoculum pressure (12000 teliospores g-1 of soil) over three years. Disease resistance parameters showed a broad range of variation with incidence mean values ranging from 1.0 to 35.0% and disease severity index ranging from 0.01 to 0.30. Average heritability (h2) estimates of 0.61 to 0.73 indicated that resistance in the RILs was heritable, with several lines (4 to 7 from each cross) showing a high degree of resistance and stability over three years. Evidence of genetic transfer between genetically distinguishable germplasm (introgression in a broad sense) was further supported by simple-sequence repeats (SSRs) and Insertion/Deletion (InDel) marker genotyping. This is the first report of smut genetic resistance identified in peanut landraces and its introgression into elite peanut cultivars.
Assuntos
Arachis/genética , Basidiomycota/patogenicidade , Resistência à Doença/genética , Doenças das Plantas/genética , Imunidade Vegetal/genética , Alelos , Arachis/imunologia , Arachis/microbiologia , Basidiomycota/crescimento & desenvolvimento , Cruzamentos Genéticos , Marcadores Genéticos , Genótipo , Mutação INDEL , Repetições de Microssatélites , Melhoramento Vegetal/métodos , Doenças das Plantas/imunologia , Característica Quantitativa HerdávelRESUMO
Cluster analysis is one of the crucial steps in gene expression pattern (GEP) analysis. It leads to the discovery or identification of temporal patterns and coexpressed genes. GEP analysis involves highly dimensional multivariate data which demand appropriate tools. A good alternative for grouping many multidimensional objects is self-organizing maps (SOM), an unsupervised neural network algorithm able to find relationships among data. SOM groups and maps them topologically. However, it may be difficult to identify clusters with the usual visualization tools for SOM. We propose a simple algorithm to identify and visualize clusters in SOM (the RP-Q method). The RP is a new node-adaptive attribute that moves in a two dimensional virtual space imitating the movement of the codebooks vectors of the SOM net into the input space. The Q statistic evaluates the SOM structure providing an estimation of the number of clusters underlying the data set. The SOM-RP-Q algorithm permits the visualization of clusters in the SOM and their node patterns. The algorithm was evaluated in several simulated and real GEP data sets. Results show that the proposed algorithm successfully displays the underlying cluster structure directly from the SOM and is robust to different net sizes.
Assuntos
Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Animais , Simulação por Computador , Interpretação Estatística de Dados , Células HL-60 , Hematopoese/genética , Humanos , Leucemia/genética , RatosRESUMO
Garlic can be infected by a number of viruses, including allexiviruses. The coat protein sequence of an Allexivirus was detected in Argentina and deposited in the EMBL database as Garlic mite-borne filamentous virus (accession number X98991); it has high homology with Garlic virus A (GarV-A). For reliable virus detection, plants should be sampled when virus titer is high to reduce the risk of identifying infected plants as healthy. The objective of this study was to describe fluctuations in the concentration of this Argentine isolate of GarV-A in two garlic cultivars, Morado-INTA and Nieve-INTA, throughout the crop cycle using the double-antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA). Over a 2-year period, for both cultivars, virus concentration was assessed in samples from the tips section of the youngest leaves of GarV-A-infected plants, and from basal sections of both dormant and devernalized cloves of stored bulbs of Morado-INTA. The concentration of GarV-A varied during the crop cycle, but peaked at the beginning and again at the end of the crop cycle. Virus concentration was slightly higher in devernalized cloves compared with dormant cloves of Morado-INTA. No correlation between virus concentration and mean air temperature was observed. The results of this study recommend sampling times at the beginning of the crop cycle at 64 to 81 days after planting, and towards the end of the crop cycle to evaluate for the presence of GarV-A by DAS-ELISA.
RESUMO
The ratio of oleic to linoleic acids (O/L) and the tocopherol content are important features in determining peanut (Arachis hypogaea) seed shelf life. Soluble carbohydrates are known to be important precursors in roasted peanut flavor. The chemical qualities of Argentine grain are different from those of other countries, but no previous studies that associate grain quality and environmental parameters have been performed. Relationships were determined between O/L, tocopherol and sugar contents, and variations in temperature and rainfall during the grain filling period of Florman INTA peanuts. Dry seed yield was used as another explanatory variable. Multiple regression procedure gave mean temperature (positive coefficient) and total precipitation (negative coefficient) as the explanatory variables for variations in O/L. Total precipitation and dry seed yield (both negative coefficients) were found to be predictor variables for tocopherol and sugar contents. Total precipitation was an explanatory variable included in all of the linear regression models obtained in this study.
Assuntos
Arachis/química , Carboidratos/análise , Clima , Óleos de Plantas/química , Sementes , Arachis/crescimento & desenvolvimento , Argentina , Óleo de Amendoim , Controle de QualidadeRESUMO
Set enrichment analysis (SEA) is used to identify enriched biological categories/terms within high-throughput differential expression experiments. This is done by evaluating the proportion of differentially expressed genes against a background reference (BR). However, the choice of the "appropriate" BR is a perplexing problem and results will depend on it. Here, a visualization procedure that integrates results from several BRs and a stability analysis of enriched terms is presented as a tool to aid SEA. The multi-reference contrast method (MRCM) combines results from multiple BRs in a unique picture. The application of the proposed method was illustrated in one proteomic and three microarray experiments. The MRCM facilitates the exploration task involved in ontology analysis on proteomic/genomic experiments, where consensus terms were found to validate main experimental hypothesis. The use of more than one reference may provide new biological insights. The tool automatically highlights non-consensus terms assisting SEA.
Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Genômica/métodos , Terminologia como Assunto , Animais , Análise por Conglomerados , Eletroforese em Gel Bidimensional , Perfilação da Expressão Gênica , Humanos , Camundongos , Modelos Teóricos , Análise de Sequência com Séries de Oligonucleotídeos , ProteômicaRESUMO
The knowledge of the underlying molecular kinetics is a key point for the development of a dialysis treatment as well as for patient monitoring. In this work, we propose a kinetic inference method that is general enough to be used on different molecular types measured in the spent dialysate. It estimates the number and significance of the compartments involved in the overall process of dialysis by means of a spectral deconvolution technique, characterizing therefore the kinetic behavior of the patient. The method was applied to 52 patients to reveal the underlying kinetics from dialysate time-concentration profiles of urea, which has a well-known molecular kinetic. Three types of behaviors were found: one-compartmental (exponential decay Tau = 180 +/- 61.64 minutes), bicompartmental (Tau1 = 24.96 +/- 19.33 minutes, Tau2 = 222.32 +/- 76.59 minutes), and tricompartmental (Tau1 = 23.03 +/- 14.21 minutes; Tau2 = 85.75 +/- 27.48 minutes; and Tau3 = 337 +/- 85.52 minutes). In patients with bicompartmental kinetics, the Tau2 was related to the level of dialysis dose. The study concluded that spectral deconvolution technique can be considered a powerful tool for molecular kinetics inference that could be integrated in on-line molecular analysis devices. Furthermore, the method could be used in the analysis of poorly understood molecules as well as in new hemodialysis target biomarkers.
Assuntos
Artefatos , Sistemas On-Line , Diálise Renal/métodos , Adulto , Idoso , Algoritmos , Feminino , Humanos , Cinética , Masculino , Taxa de Depuração Metabólica , Modelos Biológicos , Monitorização Fisiológica/métodos , Ureia/análise , Ureia/sangueRESUMO
En colecciones de germoplasma, el material genético de interés es caracterizado a través de múltiples descriptores (variables). Cada accesión de la colección es representada por un vector de datos que pertenece a un espacio multidimensional. Las configuraciones multidimensionales son difíciles de interpretar al no ser fácilmente visualizadas. Un objetivo del análisis de matrices de datos de accesiones x descriptores es el ordenamiento del material genético en un espacio bi-dimensional, el cual comúnmente es óptimo por representar la máxima variabilidad. Métodos de análisis vectorial, como el Análisis de Componentes Principales (ACP), permiten reducir la dimensión bajo ese criterio de optimalidad. Los resultados del ACP se visualizan representando las accesiones a ordenar como puntos de un gráfico de dispersión según el valor que éstas asumen sobre los dos ejes principales (de mayor varianza) de ordenación. Por la pérdida de información al ordenar en un espacio de dos dimensiones, las distancias en el plano suelen no ser las distancias en el espacio original, conduciendo a errores de interpretación de relaciones entre accesiones. En este trabajo se cuantifica el error de interpretación en las relaciones inferidas del plano generado por los dos primeros componentes principales (CP), bajo escenarios simulados que involucran distintos tamaños de colecciones de germoplasma para un rango amplio de variabilidad explicada por los dos primeros CP (medida indirecta de la calidad de la representación). Los resultados sugieren que aunque estos componentes expliquen >70 por ciento de la variabilidad total mayor, el error de interpretación es estadísticamente > 0 y depende del número de objetos ordenados. Los Arboles de Expansión Mínimos, como complemento de ordenaciones producidas por análisis vectoriales, representan una herramienta eficiente para entender mejor las ordenaciones. Se ilustra la utilización de esta técnica en la interpretación de las ordenaciones producidas a partir del ACP