Building an Object Recognition App and Protecting It From Bots

Introduction

I love building tech for other people to use.

Unfortunately, I learned early on that if your application is accessible to users, it’s also vulnerable to cyberattacks. This is a problem that developers everywhere face, from the person adding a form to their blog to the programmer building applications used by millions.

That’s why I was so excited to join UnifyID, a startup building passwordless authentication solutions, as an intern this summer. When I started, a particular product called HumanDetect really caught my eye. Quoting the documentation,

“UnifyID HumanDetect helps you determine whether a user of your app is a human or a bot. While the purpose is similar to a CAPTCHA, HumanDetect is completely passive, creating a frictionless user experience.”

I imagine most people have come across CAPTCHAs in one form or another, so I can’t be the only person who’s super annoyed each time I’m asked to click a bunch of small boxes.

reCAPTCHA

From a developer’s perspective, CAPTCHAs aren’t ideal either. They can be time consuming, which hurts the user experience by interrupting the flow of an application. Additionally, they may be difficult to complete (especially on the smaller screens of mobile devices), making it harder for some users to access features. However, for a long time CAPTCHAs have been one of the only reliable ways to verify humans, reduce spam, and prevent bot attacks. Despite their downsides, they’ve become a fixture of software as we know it.

UnifyID HumanDetect is different: machine learning algorithms passively determine whether a user is human, replacing CAPTCHAs while not interrupting the user flow. Additionally, while CAPTCHAs work best for web apps, HumanDetect is designed for mobile applications, which don’t have many reliable human authentication methods. To me, this is exciting—HumanDetect completely eliminates the need for explicit user actions.

In this blog post, I’ll outline how I built a simple object recognition app for iOS. It allows users to take a picture using the phone’s camera, which is sent to the Flask backend. There, the image is run through a neural network and the inference results are returned to the app.

After finishing the basic functionality of the app, I added HumanDetect to protect my app from bot attacks, which should give you a good idea of how developers can take advantage of this tool. Finally, I’ve linked all my code so that you can run everything yourself and see how you can use HumanDetect to protect your own apps.

Building the Flask Server

The first part of this project involved setting up a Flask server to serve as the backend of this app. Functionally, it will accept a POST request that contains an image, use a machine learning model to generate predictions based on the picture, and return the five most likely results.

I chose to use Python for the server side of the project because it’s the language I’m most comfortable with, and it’s extremely easy to code and debug. Plus, it’s widely used for machine learning, so adding object classification should be a piece of cake. I decided to use Flask over another framework like Django for similar reasons. I’ve previously used Flask for a couple of projects and it’s also lightweight, meaning it’s super simple to get up and running.

To start off, I needed to set up my environment. Isolating the packages I was using for this project was crucial since I’d need to replicate everything when I deployed my app to a cloud platform. I chose to use Conda simply because it’s what I’m most comfortable with (there’s a theme here, in case you haven’t noticed), although virtualenv would have been fine, too.

Next, I installed Flask and created a simple app that was just a webpage with “HumanDetect Example” on it. After running it locally and verifying that everything was set up correctly, I created a project in Heroku and prepared to deploy my app.

HumanDetect Webpage

To do this, I had to set up a custom CI/CD pipeline for GitLab that would trigger a fresh deployment each time I made a commit, which ended up taking quite a bit of time. Things are a lot simpler if you’re using GitHub (which is where the example code for this project is hosted, fortunately).

With most of the setup out of the way, I could finally begin building the functionality. First, I needed a way to accept an image via a POST request. Although I tried encoding the file as a string, I ended up using a method that simulated uploading a file via a multi-part form POST body before saving it to an ./upload folder.

@app.route("/", methods=['GET', 'POST'])
def home():
  if request.method == 'POST':
    file = request.files['file']
    filename = os.path.join('./uploads', 'image.png')
    file.save(filename)

Arguably the most important part of this whole project is the machine learning object recognition code. Although I could have made it quite complex, I made a couple of decisions that simplified this part as much as possible. I decided to use Keras because it is incredibly easy to use, and includes several common pre-trained models that only take a few lines of code to implement. Plus, I’m not too concerned about performance, so there isn’t really a particular reason to use TensorFlow or PyTorch in this case.

Keras provides a number of Convolutional Neural Networks (CNNs) covering the most common and high performing architectures for image classification. Because free Heroku dynos have a memory constraint, I wanted to minimize the size of the model while ensuring that accuracy is still high. I ultimately decided to go with the MobileNet architecture, a highly efficient network that performs within a few percentage points of VGG, ResNet, and Inception models. Since Keras provides pre-trained weights for the ImageNet dataset, I decided to use them without training my own models.

Before being fed into the model, I needed to preprocess the data so I would get the most accurate classification results. The CNN is built for inputting RGB images with a 224*224 resolution, but the images that I’ll be taking from the iOS app won’t have these dimensions. Therefore, I needed to resize each image using OpenCV. I decided to use this approach rather than cropping some parts of the image out to make a perfect square because important elements could be cut out, and the trained model should be robust enough to ignore minor changes to the aspect ratio.

Once the preprocessed image is fed into the model, the results need to be returned via a response to the POST request. I decided to get the five classes with the highest probabilities, reformat them into a single clean string, and return this string.

img = cv2.imread(filename)
img = cv2.resize(img, (224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = decode_predictions(model.predict(x), top=5)[0]

preds_formatted = ", ".join([
    f"{class_description}: {score*100:.2f}%"
    for (_, class_description, score) in preds
])

print("predictions: ", preds_formatted, "\n")
return preds_formatted

To test that everything was working, I wrote a simple Python script that submits this image of a taxi via a POST request.

Taxi

Here’s the returned response:

cab: 87.69%, police_van: 5.23%, racer: 1.45%, sports_car: 1.33%, car_wheel: 1.23%

Success! With the Flask app complete, I moved on to the next part of the project.

Building the iOS App

Let me make something clear: I’m not an iOS developer. I’ve built several apps for Android and the web, but I’ve never really tried Swift or Xcode—in fact, I haven’t even owned an Apple device in the last 7 years. Therefore, everything about this iOS thing was new for me, and I had to lean pretty heavily on Google and Stack Overflow.

Luckily, the Apple developer environment seemed relatively intuitive, and was in many ways simpler than building apps for its Android counterparts. It took me some time to go through a few basic iOS development guides online, but before long I was up and running with my first app in Xcode.

The most important function of the app is that it allows a user to take a picture using the phone’s camera. To accomplish this, I used a view controller called UIImagePickerController which adds the ability to capture images in just a few lines of code. I just followed the instructions from this article that I found on Google, and got this part working pretty quickly.

iOS Screenshot 1

Now that the user can take a picture, it needs to be sent via a POST request to the Flask server. Because of the way the backend expects the request to be made, I ended up having to manually add some metadata and body content. Although it looks a bit messy (and there might be a cleaner way to do it), I eventually did get it working, which is what counts.

let filename = "image.png"
let boundary = UUID().uuidString
let config = URLSessionConfiguration.default
let session = URLSession(configuration: config)
var urlRequest = URLRequest(url: URL(string: flaskURL)!)
urlRequest.httpMethod = "POST"
urlRequest.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")
var data = Data()
                
data.append("\r\n--\(boundary)\r\n".data(using: .utf8)!)
data.append("Content-Disposition: form-data; name=\"file\"; filename=\"\(filename)\"\r\n".data(using: .utf8)!)
data.append("Content-Type: image/png\r\n\r\n".data(using: .utf8)!)
data.append(image.pngData()!)
data.append("\r\n--\(boundary)--\r\n".data(using: .utf8)!)

Finally, I added a few UI elements to finish up the iOS app. I set up a loading screen spinner that is activated just after the picture is taken and deactivated once the response to the POST request is received. I also added a pop up alert that displays the object recognition results to the user.

iOS Results Screenshots

And that’s it! The main functionality of the object recognition app is now complete.

Protecting From Bots

This project is a great example of a possible use case for HumanDetect. Since the object recognition functionality involves quite a bit of machine learning and heavy processing, it’s important to ensure that each request to the backend is made by legitimate users of the app. An attack involving many unauthorized requests could become very costly (both computationally and financially) or even cause the app to become overwhelmed and crash. Implementing a verification step with HumanDetect before each POST request is processed can protect apps like this from attacks.

Adding HumanDetect to the app was surprisingly easy, as the documentation provides step-by-step instructions for adding it to the frontend and backend. Before I wrote any additional code, I created a new developer account at developer.unify.id. After setting up a new project in the dashboard, I came across a page with a bunch of technical jargon.

UnifyID Dashboard

For HumanDetect, the only things that matter are API Keys and SDK Keys. An API key gives access to the Server APIs that are used to verify whether a request to the backend is from a human or bot, while an SDK Key is used to initialize the iOS SDK and allows the app to generate a unique token that encodes information about the human/bot user. For this project, I went ahead and created one of each.

There are a few things that needed to happen on the iOS side. Once the HumanDetect pod is added, I initialized the SDK in AppDelegate.swift using the SDK key generated from the dashboard.

import UnifyID

let unify : UnifyID = { try! UnifyID(
    sdkKey: "<YOUR SDK KEY>"
)}()

Next, I set up an instance of HumanDetect to utilize its functionality.

import HumanDetect
let humanDetect = unify.humanDetect

Data capture needs to be manually started right when the app first loads. This allows the app to begin recording data that will later be used to determine if the user is a human or bot. Maximizing the time when data capture is active will generally result in higher accuracy.

override func viewDidLoad() {
    super.viewDidLoad()
    humanDetect.startPassiveCapture()
}

Data capture continues until a token is generated right after the picture is taken, and the token is added to the same POST request as the picture to be sent to the backend.

switch humanDetect.getPassive() {
    case .success(let humanDetectToken):
                
        // Creating POST request
        let fieldName = "token"
        let fieldValue = humanDetectToken.token
        
        …

        data.append("\r\n--\(boundary)\r\n".data(using: .utf8)!)
        data.append("Content-Disposition: form-data; name=\"\(fieldName)\"\r\n\r\n".data(using: .utf8)!)
        data.append("\(fieldValue)".data(using: .utf8)!)
                
        …

        // POST request to Flask server
        session.uploadTask(with: urlRequest, from: data, completionHandler: { responseData, response, error in
                    
            …

        }).resume()

    …

}

The Flask server has also been modified to accept the token generated by the iOS app. Right after the POST request from the app, the server makes its own POST request to https://api.unify.id/v1/humandetect/verify containing the generated token and the API Key from the developer dashboard.

HEADERS = {
    'Content-Type': 'application/json',
    'X-API-Key': <YOUR-API-KEY>,
}

@app.route("/", methods=['GET', 'POST'])
def home():
    if request.method == 'POST':
        file = request.files['file']
        token = request.form['token']

        if not file:
            return "Error: file not found in request"

        if not token:
            return "Error: token not found in request"

        print("token:", token)

        hd_response = requests.post('https://api.unify.id/v1/humandetect/verify', headers=HEADERS, data=json.dumps({"token": token}))

        if hd_response.status_code == 400:
            return "Error: invalid HumanDetect token"

        hd_json = hd_response.json()

        if "valid" not in hd_json or not hd_json["valid"]:
            return "Error: HumanDetect verification failed"

If the response indicates that the user is a valid human, the image is run through the Convolutional Neural Network normally. If it detects that the request is made by a bot, however, it will immediately return an error message without running the machine learning code. This ensures that bots won’t overwhelm server resources, and helps protect the integrity of the application’s infrastructure.

Next Steps

The code for this HumanDetect example is available at https://github.com/UnifyID/humandetect-sample-flask and https://github.com/UnifyID/humandetect-sample-ios. Instructions for setting everything up are included in the README files. If you run into any issues or have questions about HumanDetect, feel free to contact us.

If you want to learn more about how to counter bot attacks, I’d highly suggest reading this Medium article, which goes into more detail about various solutions including HumanDetect.

Thanks for reading! I hope that this has been helpful.