Wine yield forecasts, weather data, satellite imagery, machine learning, agricultural yield, Python, proxies, NDVI classifications per pixel, statistics computing, criterion production, algorithms, Multiple Linear Regression
Yield forecast methods based on satellite imagery are numerous, and they are usually built around diverse machine learning approaches. This internship was an introduction to a whole remote sensing workflow development, from the input data retrieving to the final forecast itself, which is to be incorporated into a wider company data-production system. This workflow is not dedicated to be applied to a very particular agricultural domain, even if it was mainly tested on vineyard data. This approach main distinctive feature takes place in the data pre-processing, an innovative method based on particular pixels detection, called "proxies". Beyond absolutely obtaining accurate results, the main objective of this internship was to characterise this new approach, to estimate whether it could be efficient and suitable for agricultural yield forecasts.
[...] MLP has a main drawback: without bigger amounts of data (longer configurations especially), it is predominantly bad, finding patterns which do not exist at all. Concerning SVM, its previsions are too homogeneous to be helpful. It is not useful to have a good accuracy if the mean is always predicted. Despite all of this, the quality of the results will not be drastically improved through better algorithms, or even better algorithm configurations. Differences remain slight, and no method is above the others. They all struggled to overcome this "error barrier" at 10hl/ha. [...]
[...] Immediately best "predictors" can be spotted: pixels with the lowest error. By repeating this step overall training years, we are able to compute a RMSE per pixel, and detect which ones offer the lowest errors over the years. Indeed, note that in this step we are speaking about training datasets, testing year, etc., but we are still in the proxy selection phase and not in the final prediction phase. The cross-validation step is "hidden" in the proxy selection in a way. [...]
[...] Figure Blueprint of the NDVI sums computing step Classification - NDVI and ground truth The proxy selection step (see next subsection) relies on classification comparisons: the one of the reference data (annual yields), and the ones of each pixel (annual NDVI sums). Let's realize these classifications now. First, let's deal with the ground truth. We suppose we are in a given Länder, Burgenland, for example. We have X wine yields, X being the number of years of study. The classes (previously chosen in I ) are adapted to this year number. We begin by ordering the annual yields in an ascending order. [...]
[...] It only equalized the forecasts overall, degrading the good ones and improving the bad ones. Differences between Länder's real yields are too high for it to be efficient. Over the years, yields do not follow the same patterns, peaks do not take place at the same time, etc. Eventually this blurs the results, since algorithms have to deal with extremely various values for the same years as well as very different correlations percentages. This does not help them find the best solution. [...]
[...] Near real time yield estimation for sugarcane in Brazil combining remote sensing and official statistical data Lilian Key Petersen. Real-time prediction of crop yields from modes relative vegetation health: A continent-wide analysis of Africa Appendixes Differences example between class systems correlations This table is to be compared to Table in subsubsection I Here correlations are very close with 5 and 7 classes, but 3 classes are totally different. This does not mean that 5-class system and 7-class are better, only that differences are noticeable. [...]
APA Style reference
For your bibliographyOnline reading
with our online readerContent validated
by our reading committee