How to Deploy to Production a Machine Learning Model Using Flask

The machine learning process involves multiple steps from problem identification to deploy to production and even retraining the model or build pipelines. All of this sounds quite complicated, and it is, but if you’re a beginner and want to learn how to deploy a simple Machine Learning model this is the right place to learn it. 



The tools and steps we’ll be using/following are
  1. Find a simple problem to solve using an ML model
  2. Use Google Colab to import data samples, and create EDAs and Model
  3. Create our Exploratory Data Analysis
  4. Build a simple linear regression model to predict the outcome
  5. Save/Serialize the model to use it in production
  6. Create Project Structure (boilerplate) to work with Flask
  7. Create the API endpoints with Flask
  8. Deploy the model to a real server using DigitalOcean
  9. Resources



Step #1 - Find a simple problem to solve using an ML model


The goal for this post is not to solve a hard problem. It is just to know all the processes involved to deploy a Machine Learning model to production systems, with an end-to-end perspective. For these reasons we choose the most simple problem: “Predict the salary of an engineer given the number of years in experience”. 


Step #2- Use Google Colab to create data, EDA and Model


The best way to save time and resources is to use Google Colab to create our data, EDAs and modeling. The best and simple way to keep things organized, go to your Google Drive and click on connect more apps, search for Colaboratory and follow the next screenshots

Note: if you don’t know what is Colab, please refer to this link: https://research.google.com/colaboratory/faq.html





You are ready. Now let’s create our first Notebook


This will open the next window


Rename the project





Now let’s create data samples. Go to the following link and download the CSV. Upload that file into your Google Drive inside the same folder you created the Google Colab file




Now we have to connect the CSV file with Google Colab. To do this follow the next tutorial. It's quite simple and a necessary step to read the data. I followed the simple steps




Once everything is done you should see this folder tree (with your own files)


Copy that path, it will help us to import that data. Copy the following commands into your Google Colab to check everything is working well. 



import pandas as pd

csv_in_drive = "/content/drive/MyDrive/your_path/salary.csv"

df = pd.read_csv(csv_in_drive)
df.head()



My result



We can now see the table with the csv data. It works!

The problem statement is very clear about our purpose, and we want to keep it as simple as possible. For this reason we’ll be using just +20 historical records with just 2 columns: “Years of experience” and “salary”. 

Let’s tune our Google Colab to dark mode!


Result


Much better!

Step #3- Create our Exploratory Data Analysis


For this step you can download my Google Colab here: https://colab.research.google.com/drive/13o5u6Hwvs1GlnTL7EVRyLovQZQiqUhN3?usp=sharing , but remember to change the paths to your own files and follow along with me this description. I'll copy and paste the Python code here. Take your time to understand what I explained in the notebook

import seaborn as snsimport matplotlib.pyplot as plt
from sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import r2_score
import pandas as pd

csv_in_drive = "/content/drive/MyDrive/6- Marketing/1-Blog Posts/1- Salary Prediction Blog Post/salary.csv"

df = pd.read_csv(csv_in_drive)

df.head()

"""# Importing Libraries"""

import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

"""# EDAs

This is a simple Exploratory Data Analysis
"""

# to check the amount of data
df.shape

# To check if we have any null values in the dataset
df.isnull().values.any()

## dividing data
train_set, test_set = train_test_split(df, test_size=0.2, random_state=42)

## train_set copy
df_copy = train_set.copy()

## exploratory analysis
df_copy.describe()

# search for correlations
df_copy.corr()

df_copy.plot.scatter(x='YearsExperience', y='Salary')

"""We have intentionally created a high correlated dataset to build a simple linear regression"""

sns.regplot('YearsExperience', 'Salary', data=df_copy)

Remember you can access my Google Colab here: https://colab.research.google.com/drive/13o5u6Hwvs1GlnTL7EVRyLovQZQiqUhN3?usp=sharing

Step #4- Build a simple linear regression model to predict the outcome


Now I'm going to create the model

"""# Building the Model
Now that we have a high correlted data, we want to build the model
"""

## building the model
test_set_full = test_set.copy()
test_set = test_set.drop(["Salary"], axis=1)

test_set.head()

train_labels = train_set["Salary"]
train_labels.head()

## with train data
train_set_full = train_set.copy()
train_set = train_set.drop(["Salary"], axis=1)

train_set.head()

import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")

lin_reg = LinearRegression()
lin_reg.fit(train_set, train_labels)

print("Coefficients", lin_reg.coef_)
print("Intercept", lin_reg.intercept_)

salary_pred = lin_reg.predict(test_set)
salary_pred

print(salary_pred)
print(test_set_full['Salary'])

"""Seems like we have good results comparing 'real' data with the 'predicted' data"""

# Lets check the scores

lin_reg.score(test_set, test_set_full["Salary"])

r2_score(test_set_full["Salary"], salary_pred)

r2_score = (test_set_full["Salary"], salary_pred)
r2_score

plt.scatter(test_set_full["YearsExperience"], test_set_full["Salary"], color="blue")
plt.plot(test_set_full["YearsExperience"], salary_pred, color="red", linewidth=2)

Conclusion

With this we end a very simple model, this will help us to understand how to deploy


Step #5- Save/Serialize the model to use it in production


Now I'm going save/serialize the model



"""# Persisting the Model"""

## Model persistence
import pickle

with open ("/content/drive/MyDrive/6- Marketing/1-Blog Posts/1- Salary Prediction Blog Post/python_lin_reg_model.pkl", "wb") as file_handler:
    pickle.dump(lin_reg, file_handler)
    
with open ("/content/drive/MyDrive/6- Marketing/1-Blog Posts/1- Salary Prediction Blog Post/python_lin_reg_model.pkl", "rb") as file_handler:
    loaded_pickle = pickle.load(file_handler)
    
loaded_pickle

# Just for testing purposes, lets use the joblib instead of pickle:

import joblib

joblib.dump(lin_reg, "/content/drive/MyDrive/6- Marketing/1-Blog Posts/1- Salary Prediction Blog Post/linear_regression_model.pkl")

"""# Conclusion

With this we end a very simple model, this will help us to understand how to deploy de result to production
"""



Step #6- Create Project Structure (boilerplate) to work with Flask


Great, we have the EDAs and the Model ready, even we have the .pkl file with the result, so we have to create the service to expose this to the real world. But before that we have to organize the structure of the project in Flask. This will help us to have a common pattern where the files live and where we can extend to create and add more models under the same server

So let's start creating our folder structure. Open a terminal and type the following. If you don't know what is a terminal, go here to learn about it

$ mkdir 1-simple-salary$ cd 1-simple-salary

Then let’s install some packages. The first one is virtualenv. If you don’t know what a virtualenv is, please refer to this resource


$ pip3 install virtualenv$ virtualenv .$ source bin/activate



Now let’s use Git and Github to track changes. It is very important to use git now because the deployment will read the changes from our Github repository. So, create a Repository in Github and then let’s start git in our machine


git initgit commit -m "first commit"git branch -M maingit remote add origin [email protected]:your_user/project_name.gitgit push origin main





Resources



Daniel Morales Perez

Author


Daniel Morales Perez


Other posts