Crop Yield Prediction Using ML And Flask Deployment

Introduction

Crop yield prediction is an essential predictive analytics technique in the agriculture industry. It is an agricultural practice that can help farmers and farming businesses predict crop yield in a particular season when to plant a crop, and when to harvest for better crop yield. Predictive analytics is a powerful tool that can help to improve decision-making in the agriculture industry. It can be used for crop yield prediction, risk mitigation, reducing the cost of fertilizers, etc. The crop yield prediction using ML and flask deployment will find analysis on weather conditions, soil quality, fruit set, fruit mass, etc.

flask deployment | crop yield prediction | ML — Unsplash

Learning Objectives

We will briefly go through the end-to-end project to predict crop yield using pollination simulation modeling.

We will follow each step of the data science project lifecycle including data exploration, pre-processing, modeling, evaluation, and deployment.

Finally, we will deploy the model using Flask API on a cloud service platform called render.

So let’s get started with this exciting real-world problem statement.

This article was published as a part of the Data Science Blogathon.

Project Description of Crop Yield Prediction

The dataset used for this project was generated using a spacial-explicit simulation computing model to analyze and study various factors that affect the wild-blue berry prediction including:

Plant spatial arrangement

Outcrossing and self-pollination

Bee species compositions

Weather conditions (in isolation and in combination) affect pollination efficiency and yield of the wild blueberry in the agricultural ecosystem.

The simulation model has been validated by the field observation and experimental data collected in Maine, USA, and Canadian Maritimes during the last 30 years and now is a useful tool for hypothesis testing and estimation of wild blueberry yield prediction. This simulated data provides researchers with actual data collected from the field for various experiments on crop yield prediction as well as provides data for developers and data scientists to build real-world machine learning models for crop yield prediction.

A simulated wild blueberry field | flask deployment | crop yield prediction | ML — A simulated wild blueberry field

What is the Pollination Simulation Model?

Pollination simulation modeling is the process of using computer models to simulate the process of pollination. There are various use cases of pollination simulation such as:

Studying the effects of different factors on pollination, such as climate change, habitat loss, and pesticides

Designing pollination-friendly landscapes

Predicting the impact of pollination on crop yields

Pollination simulation models can be used to study the movement of pollen grains between flowers, the timing of pollination events, and the effectiveness of different pollination strategies. This information can be used to improve pollination rates and crop yields which can further help farmers to produce crops effectively with optimal yield.

Pollination simulation models are still under development, but they have the potential to play an important role in the future of agriculture. By understanding how pollination works, we can better protect and manage this essential process.

In our project, we will use a dataset with various features like ‘clonesize’, ‘honeybee’, ‘RainingDays’, ‘AverageRainingDays’, etc., which were created using a pollination simulation process to estimate crop yield.

Problem Statement

In this project, our task is to classify yield variable (target feature) based on the other 17 features step-by-step by going through each day’s task. The evaluation metrics will be RMSE scored. We will deploy the model using Python’s Flask framework on a cloud-based platform.

Pre-requisites

This project is well-suited for intermediate learners of data science and machine learning to build their portfolio projects. begineers in the field can take up this project if they are familiar with below skills:

Knowledge of Python programming language, and machine learning algorithms using the scikit-learn library

Basic understanding of website development using Python’s Flask framework

Understanding of Regression evaluation metrics

Data Description

In this section, we will look the each and every variable of the dataset for our project.

Clonesize — m2 — The average blueberry clone size in the field

Honeybee — bees/m2/min — Honeybee density in the field

Bumbles — bees/m2/min — Bumblebee density in the field

Andrena — bees/m2/min — Andrena bee density in the field

Osmia — bees/m2/min — Osmia bee density in the field

MaxOfUpperTRange — ℃ —The highest record of the upper band daily air temperature during the bloom season

MinOfUpperTRange — ℃ — The lowest record of the upper band daily air temperature

AverageOfUpperTRange — ℃ — The average of the upper band daily air temperature

MaxOfLowerTRange — ℃ — The highest record of the lower band daily air temperature

MinOfLowerTRange — ℃ — The lowest record of the lower band daily air temperature

AverageOfLowerTRange — ℃ — The average of the lower band daily air temperature

RainingDays — Day — The total number of days during the bloom season, each of which has precipitation larger than zero

AverageRainingDays — Day — The average of rainy days in the entire bloom season

Fruitset — Transitioning time of fruit set

Fruitmass — Mass of the fruit set

Seeds — Number of seeds in fruitset

Yield — Crop yield (A target variable)

What is the value of this data for crop prediction use-case?

This dataset provides practical information on wild blueberry plant spatial traits, bee species, and weather situations. Therefore, it enables researchers and developers to build machine learning models for early prediction of blueberry yield.

This dataset can be essential for other researchers who have field observation data but wants to test and evaluate the performance of different machine learning algorithms by comparing the use of real data against computer simulation generated data as input in crop yield prediction.

Educationalists at different levels can use the dataset for training machine learning classification or regression problems in the agricultural industry.

Loading Dataset

In this section, we will load the dataset in whichever environment you are working on. Load the dataset in the kaggle environment. Use the kaggle dataset or download it to your local machine and run it on the local environment.

Dataset source: Click Here

Let’s look at the code to load the dataset and load the libraries for the project.

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. ```python

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.feature_selection import mutual_info_regression, SelectKBest

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.model_selection import train_test_split, cross_val_score, KFold 

from sklearn.model_selection import GridSearchCV, RepeatedKFold

from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor 

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

import sklearn

from sklearn.pipeline import Pipeline

from sklearn.model_selection import GridSearchCV

import statsmodels.api as sm

from xgboost import XGBRegressor

import shap
# setting up os env in kaggle 

import os

for dirname, _, filenames in os.walk('/kaggle/input'):

    for filename in filenames:

        print(os.path.join(dirname, filename))
# read the csv file and load first 5 rows in the platform 

df = pd.read_csv("/kaggle/input/wildblueberrydatasetpollinationsimulation/WildBlueberryPollinationSimulationData.csv", index_col="Row#")

df.head()

```

This code snippet reads the CSV file and displays the first 5 rows of the dataset. It also sets up the Kaggle environment and prints the file paths. Additionally, explore the Ordinary Least Square regression modeling using the statsmodels library and use the shape model explainer to visually identify the most important features for predicting our target crop yield.
Machine Learning Modeling Baseline:
In the machine learning modeling section above, we observed the lowest mean squared error on the gradient boosting regressor and the highest error on the Adaboost regressor. Next, we will train the gradient boosting model and assess the error using the scikit-learn train and test split method.
After splitting the data into training and testing sets, we applied the gradient boosting regressor model and evaluated its performance. The RMSE score for the gradient boosting machine was approximately 363, with an R2 score of around 93%, indicating an improved model accuracy compared to the baseline accuracy. Further hyperparameter tuning is needed to optimize the model's accuracy.
Hyperparameters Tuning:
We utilized K-fold cross-validation to split the dataset and defined a parameter grid for tuning the hyperparameters of the gradient boosting regressor. By performing a Grid Search CV, we identified the best estimator and parameters for the model, resulting in a reduced error score of 306.
Shap Model Explainer:
The Shap library was used to explain the machine learning model's predictions by calculating the Shaply values of each feature's influence on the target variable. The output plot indicated that the "AverageRainingDays" feature had the most significant impact on predicting the target variable, while the "andrena" feature had the least influence.
Deployment of the Model Using FlaskAPI:
Finally, we discussed deploying the machine learning model through FlaskAPI on the render.com cloud service platform. 
In order to deploy the model as an API on the cloud, it is essential to save the model file with the joblib extension. The following code snippet demonstrates how to save the model using joblib after training:
```python

# remove the 'n_cluster' feature from the dataset

X_train_n = X_train.drop('n_cluster', axis=1)

X_test_n = X_test.drop('n_cluster', axis=1)
# train a model for flask API creation

xgb_model = XGBRegressor(max_depth=9, min_child_weight=7, subsample=1.0)

xgb_model.fit(X_train_n, y_train)

pr = xgb_model.predict(X_test_n)

err = mean_absolute_error(y_test, pr)

rmse_n = np.sqrt(mean_squared_error(y_test, pr))
# after training, save the model using joblib library

joblib.dump(xgb_model, 'wbb_xgb_model2.joblib')

```
This code snippet shows how to save the model file in preparation for creating the Flask API. Additionally, the article provides information on the repository structure, the `app.py` file for the Flask application, the `model.py` file for model prediction, and deployment on the Render platform. Furthermore, the article concludes with lessons learned and frequently asked questions related to crop yield prediction using machine learning in agriculture. Another goal is to assist government agencies in determining the price of crop outputs and implementing suitable measures for the storage and distribution of crop yields.
See also  Plant Disease Classification using AlexNet

What's Hot

HuggingFace Team Released FineVideo: A Comprehensive Dataset Featuring 43,751 YouTube Videos Across 122 Categories for Advanced Multimodal AI Analysis

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

Crop Yield Prediction Using ML And Flask Deployment

Digital Green and OpenAI Help Farmers in India with Farmer.Chat

Plant Disease Classification using AlexNet

Leveraging Data Science to Optimize Crop Yield

Bill Gates’ AI Predictions for Health and Education Transformation

GoSmart Brings AI to Fish Farmers Worldwide

Vision Transformers in Agriculture | Harvesting Innovation

HuggingFace Team Released FineVideo: A Comprehensive Dataset Featuring 43,751 YouTube Videos Across 122 Categories for Advanced Multimodal AI Analysis

Silicon discovery (Q-silicon) could mean advances in quantum realm, NCSU researchers say

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

Telcos to spend $20B on AI network orchestration by 2028

Zendesk: How AI Chatbots can Transform Customer Service

10 Best ChatGPT Prompts for Fiverr to Create Gigs in Seconds

Napkin Emerges from Stealth with $10M in Seed Funding to Pioneer Visual AI for Business Storytelling

About Us

Popular post

From Bard to Gemini: A Look at the Evolution of Google's AI Assistant

Alibaba Cloud launches English version of AI model hub

Why are the UK and China Leading in Gen AI Adoption?

Financial services introducing AI but hindered by data issues

Subscribe Newsletter

What's Hot

Crop Yield Prediction Using ML And Flask Deployment

Introduction

Learning Objectives

Project Description of Crop Yield Prediction

What is the Pollination Simulation Model?

Problem Statement

Pre-requisites

Data Description

Loading Dataset

Keep Reading

About Us

Popular post

Subscribe Newsletter