Image Similarity Search — from Research to Production
Let’s say you sit down in a cozy cafe with a coffee and spot a lovely lamp on the corner table. It perfectly matches the look you’re going for at home, but how can you find one just like it? Sure, there are hundreds of lamps with funny names on Ikea’s website, but scrolling through them takes ages. There’s got to be a faster, easier way.
What if you could just take a photo of the lamp and immediately find similar ones for sale online? One of our clients had the same idea.
This is how we used computer vision to take it from idea to reality.
Outline
Overview
In this blog post we will share our experience of creating an image similarity search application from research to production stages. We will try to explain it from a high level with the focus on Machine Learning (ML) / Computer Vision (CV) concepts. However, we will have some “optional sections” showing some lower level technical details time to time.
Let’s start, our main task was to find and show similar furniture images (chair, lamp, table, bed or sofa) for an input image uploaded by the user. The technical challenge here was to implement a fairly accurate image similarity search with very small reference image sets (this is the pre-indexed image set which is used to find similar images for a given input image) ~200 images per category.
For comparison, Google image search probably has millions or billions of reference images which makes it more likely to find similar images for an input image.
Let’s see a visual example of our task, here’s how Google image similarity search looks:
Research & Design Phase
We researched several machine learning libraries and decided to use the DeepDetect library. DeepDetect is an open source machine learning API and server written in C++.
First off, we deployed an image similarity search with the pre-trained model “ResNet50” (a 50 layer-deep convolutional neural network) and started testing. After some initial testing, we realized that our solution worked better when an “image classification” service is also used.
So, the final solution was to first find the class of the image (e.g. chair, lamp, table etc...) and then find similar images of that class (e.g. find similar lamp images).
Here’s a simple view of the final architecture:
More about the Resnet50 convolutional neural network:
- Resnet50 layers: http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
- More about Resnet50: https://www.mathworks.com/help/deeplearning/ref/resnet50.html
What is a Convolutional Neural Network?
A convolutional neural network (CNN or ConvNet for short) is a class of deep neural networks, most commonly used to analyze images. Let’s briefly outline the key concepts:
- Convolution layer: is repeated application of small image filters (aka features) on an input image to find a pattern or feature.
- Max-Pooling layer: is downsizing an input image with similar patterns
- Fully-Connected layer: connects every neuron in one layer to every neuron in another layer.
- Output: is the final layer which returns the prediction(s) for an input image
Here's an example CNN architecture to classify handwritten digits:
First Working Version
We implemented the first working version with 30 reference images (this is the pre-indexed image set which is used to find similar images for a given input image) which was a super small image set.
We knew that we would have better results once we increased the number of reference images and used different layers or even different convolutional neural network models.
Steps to achieve the first working version:
- Create image classification service
- Verify image predictions work
- Create image similarity search service
- Index reference images for image similarity search service
- Verify image similarity search works
Let’s look at our very first results now. The problem here is that the reference image set is too small, so it is very unlikely that the service will find a similar image.
Here are some key technical details about the creation and configuration of image similarity search service:
Key parameters explained
How to create an image similarity search service
How to index images (this is for one image, we created a script to index all different classes and images)
Image similarity search query
Image similarity search response
Getting Better Results
At this stage, we focused on improving our existing solution. We have gradually increased the reference image count, tried different layers of existing working ResNet50 models, tried different models with different extraction layers.
In the end, we saw that the result with 200 images and using the ResNet50 model (with extract_layer “pool5/7x7_s1”) was much better than others.
This image, showing initial results for the 10 most similar images from a random input image, is from development:
For those who are even more curious, let's briefly compare different convolutional neural network model architectures. We experimented with these models in order to get better results.
For instance, we have tried and compared ResNet50 last max pooling layer “7x7 avg pool” (kernel size: 7, stride: 1) versus VGG16 last max pooling layer “pool5” layer (kernel size: 3, stride: 2).
We saw that the results with Resnet50 were better. We have also tried and compared different layers of the ResNet50 model. And all of these attempts are not quick or simple tasks — for every try, a new service needs to be created and the reference image set must be re-indexed.
The Final Product
The final product has become a beautiful web application* where the users can upload furniture images to find similar products. This is, of course, a result of teamwork so I want to mention and thank our great team here:
- ML Lead & Project Manager - Vladimir Panteleev
- Designer - Elizaveta Gaiduk
- ML Engineer - Gökhan Şimşek (myself)
- Software Engineer - Denis Zhidis
*The product itself is still in very early stages and functions with a very limited image database.
Let’s see how the final product and the image search looks:
Thanks for reading 🙏 This is part two of our series on AI/ML topics. Specifically, we explore how to integrate proven technologies into products in a way that matches the needs of the end-user and the budget of the project. Part one covered Natural Language Processing with services from AWS.
Stay tuned for more!
Citations:
A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way by Sumit Saha
Deep Feature-Based Classifiers for Fruit Fly Identification Leonardo, Matheus & Carvalho, Tiago & Rezende, Edmar & Zucchi, Roberto & Faria, Fabio. (2018)