Connect with us

Hi, what are you looking for?


Designing and Deploying a Machine Learning Python Application (Part 2) | by Noah Haglund | Feb, 2024

Designing and Deploying a Machine Learning Python Application (Part 2) | by Noah Haglund | Feb, 2024

As we haven’t quite solved the key problems, let’s dig in just a bit further before getting into the low-level nitty-gritty. As stated by Heroku:

Web applications that process incoming HTTP requests concurrently make much more efficient use of dyno resources than web applications that only process one request at a time. Because of this, we recommend using web servers that support concurrent request processing whenever developing and running production services.

The Django and Flask web frameworks feature convenient built-in web servers, but these blocking servers only process a single request at a time. If you deploy with one of these servers on Heroku, your dyno resources will be underutilized and your application will feel unresponsive.

We are already ahead of the game by utilizing worker multiprocessing for the ML task, but can take this a step further by using Gunicorn:

Gunicorn is a pure-Python HTTP server for WSGI applications. It allows you to run any Python application concurrently by running multiple Python processes within a single dyno. It provides a perfect balance of performance, flexibility, and configuration simplicity.

Okay, awesome, now we can utilize even more processes, but there’s a catch: each new worker Gunicorn worker process will represent a copy of the application, meaning that they too will utilize the base ~150MB RAM in addition to the Heroku process. So, say we pip install gunicorn and now initialize the Heroku web process with the following command:

gunicorn <DJANGO_APP_NAME_HERE>.wsgi:application --workers=2 --bind=$PORT

The base ~150MB RAM in the web process turns into ~300MB RAM (base memory usage multipled by # gunicorn workers).

While being cautious of the limitations to multithreading a Python application, we can add threads to workers as well using:

gunicorn <DJANGO_APP_NAME_HERE>.wsgi:application --threads=2 --worker-class=gthread --bind=$PORT

Even with problem #3, we can still find a use for threads, as we want to ensure our web process is capable of processing more than one request at a time while being careful of the application’s memory footprint. Here, our threads could process miniscule requests while ensuring the ML task is distributed elsewhere.

Either way, by utilizing gunicorn workers, threads, or both, we are setting our Python application up to process more than one request at a time. We’ve more or less solved problem #2 by incorporating various ways to implement concurrency and/or parallel task handling while ensuring our application’s critical ML task doesn’t rely on potential pitfalls, such as multithreading, setting us up for scale and getting to the root of problem #3.

Okay so what about that tricky problem #1. At the end of the day, ML processes will typically end up taxing the hardware in one way or another, whether that would be memory, CPU, and/or GPU. However, by using a distributed system, our ML task is integrally linked to the main web process yet handled in parallel via a Celery worker. We can track the start and end of the ML task via the chosen Celery broker, as well as review metrics in a more isolated manner. Here, curtailing Celery and Heroku worker process configurations are up to you, but it is an excellent starting point for integrating a long-running, memory-intensive ML process into your application.

Now that we’ve had a chance to really dig in and get a high level picture of the system we are building, let’s put it together and focus on the specifics.

For your convenience, here is the repo I will be mentioning in this section.

First we will begin by setting up Django and Django Rest Framework, with installation guides here and here respectively. All requirements for this app can be found in the repo’s requirements.txt file (and Detectron2 and Torch will be built from Python wheels specified in the Dockerfile, in order to keep the Docker image size small).

The next part will be setting up the Django app, configuring the backend to save to AWS S3, and exposing an endpoint using DRF, so if you are already comfortable doing this, feel free to skip ahead and go straight to the ML Task Setup and Deployment section.

Django Setup

Go ahead and create a folder for the Django project and cd into it. Activate the virtual/conda env you are using, ensure Detectron2 is installed as per the installation instructions in Part 1, and install the requirements as well.

Issue the following command in a terminal:

django-admin startproject mltutorial

This will create a Django project root directory titled “mltutorial”. Go ahead and cd into it to find a file and a mltutorial sub directory (which is the actual Python package for your project).


Open and add ‘rest_framework’, ‘celery’, and ‘storages’ (needed for boto3/AWS) in the INSTALLED_APPS list to register those packages with the Django project.

In the root dir, let’s create an app which will house the core functionality of our backend. Issue another terminal command:

python startapp docreader

This will create an app in the root dir called docreader.

Let’s also create a file in docreader titled In it, define a simple function for testing our setup that takes in a variable, file_path, and prints it out:

def mltask(file_path):
return print(file_path)

Now getting to structure, Django apps use the Model View Controller (MVC) design pattern, defining the Model in, View in, and Controller in Django Templates and Using Django Rest Framework, we will include serialization in this pipeline, which provide a way of serializing and deserializing native Python dative structures into representations such as json. Thus, the application logic for exposing an endpoint is as follows:

Database ← → ← → ← → ← →

In docreader/, write the following:

from django.db import models
from django.dispatch import receiver
from .mltask import mltask
from django.db.models.signals import(

class Document(models.Model):
title = models.CharField(max_length=200)
file = models.FileField(blank=False, null=False)

@receiver(post_save, sender=Document)
def user_created_handler(sender, instance, *args, **kwargs):

This sets up a model Document that will require a title and file for each entry saved in the database. Once saved, the @receiver decorator listens for a post save signal, meaning that the specified model, Document, was saved in the database. Once saved, user_created_handler() takes the saved instance’s file field and passes it to, what will become, our Machine Learning function.

Anytime changes are made to, you will need to run the following two commands:

python makemigrations
python migrate

Moving forward, create a file in docreader, allowing for the serialization and deserialization of the Document’s title and file fields. Write in it:

from rest_framework import serializers
from .models import Document

class DocumentSerializer(serializers.ModelSerializer):
class Meta:
model = Document
fields = [

Next in, where we can define our CRUD operations, let’s define the ability to create, as well as list, Document entries using generic views (which essentially allows you to quickly write views using an abstraction of common view patterns):

from django.shortcuts import render
from rest_framework import generics
from .models import Document
from .serializers import DocumentSerializer

class DocumentListCreateAPIView(

queryset = Document.objects.all()
serializer_class = DocumentSerializer

Finally, update in mltutorial:

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
path('api/', include('docreader.urls')),

And create in docreader app dir and write:

from django.urls import path

from . import views

urlpatterns = [
path('create/', views.DocumentListCreateAPIView.as_view(), name='document-list'),

Now we are all setup to save a Document entry, with title and field fields, at the /api/create/ endpoint, which will call mltask() post save! So, let’s test this out.

To help visualize testing, let’s register our Document model with the Django admin interface, so we can see when a new entry has been created.

In docreader/ write:

from django.contrib import admin
from .models import Document

Create a user that can login to the Django admin interface using:

python createsuperuser

Now, let’s test the endpoint we exposed.

To do this without a frontend, run the Django server and go to Postman. Send the following POST request with a PDF file attached:

If we check our Django logs, we should see the file path printed out, as specified in the post save mltask() function call.

AWS Setup

You will notice that the PDF was saved to the project’s root dir. Let’s ensure any media is instead saved to AWS S3, getting our app ready for deployment.

Go to the S3 console (and create an account and get our your account’s Access and Secret keys if you haven’t already). Create a new bucket, here we will be titling it ‘djangomltest’. Update the permissions to ensure the bucket is public for testing (and revert back, as needed, for production).

Now, let’s configure Django to work with AWS.

Add your model_final.pth, trained in Part 1, into the docreader dir. Create a .env file in the root dir and write the following:

AWS_ACCESS_KEY_ID = <Add your Access Key Here>
AWS_SECRET_ACCESS_KEY = <Add your Secret Key Here>
AWS_STORAGE_BUCKET_NAME = 'djangomltest'

MODEL_PATH = './docreader/model_final.pth'

Update to include AWS configurations:

import os
from dotenv import load_dotenv, find_dotenv


#AWS Config
AWS_DEFAULT_ACL = 'public-read'
AWS_S3_OBJECT_PARAMETERS = {'CacheControl': 'max-age=86400'}

STATICFILES_STORAGE = 'mltutorial.storage_backends.StaticStorage'
DEFAULT_FILE_STORAGE = 'mltutorial.storage_backends.PublicMediaStorage'

STATIC_URL = f'https://{AWS_S3_CUSTOM_DOMAIN}/static/'
MEDIA_URL = f'https://{AWS_S3_CUSTOM_DOMAIN}/media/'

Optionally, with AWS serving our static and media files, you will want to run the following command in order to serve static assets to the admin interface using S3:

python collectstatic

If we run the server again, our admin should appear the same as how it would with our static files served locally.

Once again, let’s run the Django server and test the endpoint to make sure the file is now saved to S3.

ML Task Setup and Deployment

With Django and AWS properly configured, let’s set up our ML process in As the file is long, see the repo here for reference (with comments added in to help with understanding the various code blocks).

What’s important to see is that Detectron2 is imported and the model is loaded only when the function is called. Here, we will call the function only through a Celery task, ensuring the memory used during inferencing will be isolated to the Heroku worker process.

So finally, let’s setup Celery and then deploy to Heroku.

In mltutorial/ write:

from .celery import app as celery_app
__all__ = ('celery_app',)

Create in the mltutorial dir and write:

import os

from celery import Celery

# Set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mltutorial.settings')

# We will specify Broker_URL on Heroku
app = Celery('mltutorial', broker=os.environ['CLOUDAMQP_URL'])

# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django apps.

@app.task(bind=True, ignore_result=True)
def debug_task(self):
print(f'Request: {self.request!r}')

Lastly, make a in docreader and write:

from celery import shared_task
from .mltask import mltask

def ml_celery_task(file_path):
return "DONE"

This Celery task, ml_celery_task(), should now be imported into and used with the post save signal instead of the mltask function pulled directly from Update the post_save signal block to the following:

@receiver(post_save, sender=Document)
def user_created_handler(sender, instance, *args, **kwargs):

And to test Celery, let’s deploy!

In the root project dir, include a Dockerfile and heroku.yml file, both specified in the repo. Most importantly, editing the heroku.yml commands will allow you to configure the gunicorn web process and the Celery worker process, which can aid in further mitigating potential problems.

Make a Heroku account and create a new app called “mlapp” and gitignore the .env file. Then initialize git in the projects root dir and change the Heroku app’s stack to container (in order to deploy using Docker):

$ heroku login
$ git init
$ heroku git:remote -a mlapp
$ git add .
$ git commit -m "initial heroku commit"
$ heroku stack:set container
$ git push heroku master

Once pushed, we just need to add our env variables into the Heroku app.

Go to settings in the online interface, scroll down to Config Vars, click Reveal Config Vars, and add each line listed in the .env file.

You may have noticed there was a CLOUDAMQP_URL variable specified in We need to provision a Celery Broker on Heroku, for which there are a variety of options. I will be using CloudAMQP which has a free tier. Go ahead and add this to your app. Once added, the CLOUDAMQP_URL environment variable will be included automatically in the Config Vars.

Finally, let’s test the final product.

To monitor requests, run:

$ heroku logs --tail

Issue another Postman POST request to the Heroku app’s url at the /api/create/ endpoint. You will see the POST request come through, Celery receive the task, load the model, and start running pages:

We will continue to see the “Running for page…” until the end of the process and you can check the AWS S3 bucket as it runs.

Congrats! You’ve now deployed and ran a Python backend using Machine Learning as a part of a distributed task queue running in parallel to the main web process!

As mentioned, you will want to adjust the heroku.yml commands to incorporate gunicorn threads and/or worker processes and fine tune celery. For further learning, here’s a great article on configuring gunicorn to meet your app’s needs, one for digging into Celery for production, and another for exploring Celery worker pools, in order to help with properly managing your resources.

Happy coding!

Unless otherwise noted, all images used in this article are by the author

The article was first published here

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like


10:29 a.m. ET, February 22, 2024 AT&T recently applied for a waiver to allow it to stop servicing traditional landlines in California From CNN’s...


Everyone in Kyle Hausmann-Stokes’ compassionate feature My Dead Friend Zoe has suffered a loss. Merit (Sonequa Martin-Green), a nervous Afghanistan war veteran, is reeling...


Video: Bandai Namco Reveals New Gameplay Footage Of Dragon Ball Z: Kakarot DLC 6  Nintendo Life Dragon Ball Z: Kakarot DLC Trailer Previews Goku’s Next...


Using 3D storage techniques, scientists at the University of Shanghai for Science and Technology developed an optical disk capable of accommodating 1.6 petabits of...