RESUMO
The unprecedented availability of optical satellite data in cloud-based computing platforms, such as Google Earth Engine (GEE), opens new possibilities to develop crop trait retrieval models from the local to the planetary scale. Hybrid retrieval models are of interest to run in these platforms as they combine the advantages of physically- based radiative transfer models (RTM) with the flexibility of machine learning regression algorithms. Previous research with GEE primarily relied on processing bottom-of-atmosphere (BOA) reflectance data, which requires atmospheric correction. In the present study, we implemented hybrid models directly into GEE for processing Sentinel-2 (S2) Level-1C (L1C) top-of-atmosphere (TOA) reflectance data into crop traits. To achieve this, a training dataset was generated using the leaf-canopy RTM PROSAIL in combination with the atmospheric model 6SV. Gaussian process regression (GPR) retrieval models were then established for eight essential crop traits namely leaf chlorophyll content, leaf water content, leaf dry matter content, fractional vegetation cover, leaf area index (LAI), and upscaled leaf variables (i.e., canopy chlorophyll content, canopy water content and canopy dry matter content). An important pre-requisite for implementation into GEE is that the models are sufficiently light in order to facilitate efficient and fast processing. Successful reduction of the training dataset by 78% was achieved using the active learning technique Euclidean distance-based diversity (EBD). With the EBD-GPR models, highly accurate validation results of LAI and upscaled leaf variables were obtained against in situ field data from the validation study site Munich-North-Isar (MNI), with normalized root mean square errors (NRMSE) from 6% to 13%. Using an independent validation dataset of similar crop types (Italian Grosseto test site), the retrieval models showed moderate to good performances for canopy-level variables, with NRMSE ranging from 14% to 50%, but failed for the leaf-level estimates. Obtained maps over the MNI site were further compared against Sentinel-2 Level 2 Prototype Processor (SL2P) vegetation estimates generated from the ESA Sentinels' Application Platform (SNAP) Biophysical Processor, proving high consistency of both retrievals (R 2 from 0.80 to 0.94). Finally, thanks to the seamless GEE processing capability, the TOA-based mapping was applied over the entirety of Germany at 20 m spatial resolution including information about prediction uncertainty. The obtained maps provided confidence of the developed EBD-GPR retrieval models for integration in the GEE framework and national scale mapping from S2-L1C imagery. In summary, the proposed retrieval workflow demonstrates the possibility of routine processing of S2 TOA data into crop traits maps at any place on Earth as required for operational agricultural applications.
RESUMO
Spaceborne imaging spectroscopy is a highly promising data source for all agricultural management and research disciplines that require spatio-temporal information on crop properties. Recently launched science-driven missions, such as the Environmental Mapping and Analysis Program (EnMAP), deliver unprecedented data from the Earth's surface. This new kind of data should be explored to develop robust retrieval schemes for deriving crucial variables from future routine missions. Therefore, we present a workflow for inferring crop carbon content (Carea ), and aboveground dry and wet biomass (AGBdry , AGBfresh ) from EnMAP data. To achieve this, a hybrid workflow was generated, combining radiative transfer modeling (RTM) with machine learning regression algorithms. The key concept involves: (1) coupling the RTMs PROSPECT-PRO and 4SAIL for simulation of a wide range of vegetation states, (2) using dimensionality reduction to deal with collinearity, (3) applying a semi-supervised active learning technique against a 4-years campaign dataset, followed by (4) training of a Gaussian process regression (GPR) machine learning algorithm and (5) validation with an independent in situ dataset acquired during the ESA Hypersense experiment campaign at a German test site. Internal validation of the GPR-Carea and GPR-AGB models achieved coefficients of determination (R 2) of 0.80 for Carea and 0.80, 0.71 for AGBdry and AGBfresh , respectively. The mapping capability of these models was successfully demonstrated using airborne AVIRIS-NG hyperspectral imagery, which was spectrally resampled to EnMAP spectral properties. Plausible estimates were achieved over both bare and green fields after adding bare soil spectra to the training data. Validation over green winter wheat fields generated reliable estimates as suggested by low associated model uncertainties (< 40%). These results suggest a high degree of model reliability for cultivated areas during active growth phases at canopy closure. Overall, our proposed carbon and biomass models based on EnMAP spectral sampling demonstrate a promising path toward the inference of these crucial variables over cultivated areas from future spaceborne operational hyperspectral missions.
RESUMO
The current exponential increase of spatiotemporally explicit data streams from satellitebased Earth observation missions offers promising opportunities for global vegetation monitoring. Intelligent sampling through active learning (AL) heuristics provides a pathway for fast inference of essential vegetation variables by means of hybrid retrieval approaches, i.e., machine learning regression algorithms trained by radiative transfer model (RTM) simulations. In this study we summarize AL theory and perform a brief systematic literature survey about AL heuristics used in the context of Earth observation regression problems over terrestrial targets. Across all relevant studies it appeared that: (i) retrieval accuracy of AL-optimized training data sets outperformed models trained over large randomly sampled data sets, and (ii) Euclidean distance-based (EBD) diversity method tends to be the most efficient AL technique in terms of accuracy and computational demand. Additionally, a case study is presented based on experimental data employing both uncertainty and diversity AL criteria. Hereby, a a simulated training data base by the PROSAIL-PRO canopy RTM is used to demonstrate the benefit of AL techniques for the estimation of total leaf carotenoid content (Cxc ) and leaf water content (Cw ). Gaussian process regression (GPR) was incorporated to minimize and optimize the training data set with AL. Training the GPR algorithm on optimally AL-based sampled data sets led to improved variable retrievals compared to training on full data pools, which is further demonstrated on a mapping example. From these findings we can recommend the use of AL-based sub-sampling procedures to select the most informative samples out of large training data pools. This will not only optimize regression accuracy due to exclusion of redundant information, but also speed up processing time and reduce final model size of kernel-based machine learning regression algorithms, such as GPR. With this study we want to encourage further testing and implementation of AL sampling methods for hybrid retrieval workflows. AL can contribute to the solution of regression problems within the framework of operational vegetation monitoring using satellite imaging spectroscopy data, and may strongly facilitate data processing for cloud-computing platforms.
RESUMO
Non-photosynthetic vegetation (NPV) biomass has been identified as a priority variable for upcoming spaceborne imaging spectroscopy missions, calling for a quantitative estimation of lignocellulosic plant material as opposed to the sole indication of surface coverage. Therefore, we propose a hybrid model for the retrieval of non-photosynthetic cropland biomass. The workflow included coupling the leaf optical model PROSPECT-PRO with the canopy reflectance model 4SAIL, which allowed us to simulate NPV biomass from carbon-based constituents (CBC) and leaf area index (LAI). PROSAIL-PRO provided a training database for a Gaussian process regression (GPR) algorithm, simulating a wide range of non-photosynthetic vegetation states. Active learning was employed to reduce and optimize the training data set. In addition, we applied spectral dimensionality reduction to condense essential information of non-photosynthetic signals. The resulting NPV-GPR model was successfully validated against soybean field data with normalized root mean square error (nRMSE) of 13.4% and a coefficient of determination (R2) of 0.85. To demonstrate mapping capability, the NPV-GPR model was tested on a PRISMA hyperspectral image acquired over agricultural areas in the North of Munich, Germany. Reliable estimates were mainly achieved over senescent vegetation areas as suggested by model uncertainties. The proposed workflow is the first step towards the quantification of non-photosynthetic cropland biomass as a next-generation product from near-term operational missions, such as CHIME.
RESUMO
In support of cropland monitoring, operational Copernicus Sentinel-2 (S2) data became available globally and can be explored for the retrieval of important crop traits. Based on a hybrid workflow, retrieval models for six essential biochemical and biophysical crop traits were developed for both S2 bottom-of-atmosphere (BOA) L2A and S2 top-of-atmosphere (TOA) L1C data. A variational heteroscedastic Gaussian process regression (VHGPR) algorithm was trained with simulations generated by the combined leaf-canopy reflectance model PROSAILat the BOA scale and further combined with the Second Simulation of a Satellite Signal in the Solar Spectrum (6SV) atmosphere model at the TOA scale. Established VHGPR models were then applied to S2 L1C and L2A reflectance data for mapping: leaf chlorophyll content (Cab ), leaf water content (Cw ), fractional vegetation coverage (FVC), leaf area index (LAI), and upscaled leaf biochemical compounds, i.e., LAI * Cab (laiCab) and LAI * Cw (laiCw). Estimated variables were validated using in situ reference data collected during the Munich-North-Isar field campaigns within growing seasons of maize and winter wheat in the years 2017 and 2018. For leaf biochemicals, retrieval from BOA reflectance slightly outperformed results from TOA reflectance, e.g., obtaining a root mean squared error (RMSE) of 6.5 µg/cm2 (BOA) vs. 8 µg/cm2 (TOA) in the case of Cab . For the majority of canopy-level variables, instead, estimation accuracy was higher when using TOA reflectance data, e.g., with an RMSE of 139 g/m2 (BOA) vs. 113 g/m2 (TOA) for laiCw. Derived maps were further compared against reference products obtained from the ESA Sentinel Application Platform (SNAP) Biophysical Processor. Altogether, the consistency between L1C and L2A retrievals confirmed that crop traits can potentially be estimated directly from TOA reflectance data. Successful mapping of canopy-level crop traits including information about prediction confidence suggests that the models can be transferred over spatial and temporal scales and, therefore, can contribute to decision-making processes for cropland management.
RESUMO
Nitrogen (N) is considered as one of the most important plant macronutrients and proper management of N therefore is a pre-requisite for modern agriculture. Continuous satellite-based monitoring of this key plant trait would help to understand individual crop N use efficiency and thus would enable site-specific N management. Since hyperspectral imaging sensors could provide detailed measurements of spectral signatures corresponding to the optical activity of chemical constituents, they have a theoretical advantage over multi-spectral sensing for the detection of crop N. The current study aims to provide a state-of-the-art overview of crop N retrieval methods from hyperspectral data in the agricultural sector and in the context of future satellite imaging spectroscopy missions. Over 400 studies were reviewed for this purpose, identifying those estimating mass-based N (N concentration, N%) and area-based N (N content, Narea) using hyperspectral remote sensing data. Retrieval methods of the 125 studies selected in this review can be grouped into: (1) parametric regression methods, (2) linear nonparametric regression methods or chemometrics, (3) nonlinear nonparametric regression methods or machine learning regression algorithms, (4) physically-based or radiative transfer models (RTM), (5) use of alternative data sources (sun-induced fluorescence, SIF) and (6) hybrid or combined techniques. Whereas in the last decades methods for estimation of Narea and N% from hyperspectral data have been mainly based on simple parametric regression algorithms, such as narrowband vegetation indices, there is an increasing trend of using machine learning, RTM and hybrid techniques. Within plants, N is invested in proteins and chlorophylls stored in the leaf cells, with the proteins being the major nitrogen-containing biochemical constituent. However, in most studies, the relationship between N and chlorophyll content was used to estimate crop N, focusing on the visible-near infrared (VNIR) spectral domains, and thus neglecting protein-related N and reallocation of nitrogen to non-photosynthetic compartments. Therefore, we recommend exploiting the estimation of nitrogen via the proxy of proteins using hyperspectral data and in particular the short-wave infrared (SWIR) spectral domain. We further strongly encourage a standardization of nitrogen terminology, distinguishing between N% and Narea. Moreover, the exploitation of physically-based approaches is highly recommended combined with machine learning regression algorithms, which represents an interesting perspective for future research in view of new spaceborne imaging spectroscopy sensors.
RESUMO
Hyperspectral acquisitions have proven to be the most informative Earth observation data source for the estimation of nitrogen (N) content, which is the main limiting nutrient for plant growth and thus agricultural production. In the past, empirical algorithms have been widely employed to retrieve information on this biochemical plant component from canopy reflectance. However, these approaches do not seek for a cause-effect relationship based on physical laws. Moreover, most studies solely relied on the correlation of chlorophyll content with nitrogen, and thus neglected the fact that most N is bound in proteins. Our study presents a hybrid retrieval method using a physically-based approach combined with machine learning regression to estimate crop N content. Within the workflow, the leaf optical properties model PROSPECT-PRO including the newly calibrated specific absorption coefficients (SAC) of proteins, was coupled with the canopy reflectance model 4SAIL to PROSAIL-PRO. The latter was then employed to generate a training database to be used for advanced probabilistic machine learning methods: a standard homoscedastic Gaussian process (GP) and a heteroscedastic GP regression that accounts for signal-to-noise relations. Both GP models have the property of providing confidence intervals for the estimates, which sets them apart from other machine learners. Moreover, a GP-based sequential backward band removal algorithm was employed to analyze the band-specific information content of PROSAIL-PRO simulated spectra for the estimation of aboveground N. Data from multiple hyperspectral field campaigns, carried out in the framework of the future satellite mission Environmental Mapping and Analysis Program (EnMAP), were exploited for validation. In these campaigns, corn and winter wheat spectra were acquired to simulate spectral EnMAP data. Moreover, destructive N measurements from leaves, stalks and fruits were collected separately to enable plant-organ-specific validation. The results showed that both GP models can provide accurate aboveground N simulations, with slightly better results of the heteroscedastic GP in terms of model testing and against in situ N measurements from leaves plus stalks, with root mean square error (RMSE) of 2.1 g/m2. However, the inclusion of fruit N content for validation deteriorated the results, which can be explained by the inability of the radiation to penetrate the thick tissues of stalks, corn cobs and wheat ears. GP-based band analysis identified optimal spectral settings with ten bands mainly situated in the shortwave infrared (SWIR) spectral region. Use of well-known protein absorption bands from the literature showed comparative results. Finally, the heteroscedastic GP model was successfully applied on airborne hyperspectral data for N mapping. We conclude that GP algorithms, and in particular the heteroscedastic GP, should be implemented for global agricultural monitoring of aboveground N from future imaging spectroscopy data.