Building a search engine for logos
I recently created Token with my partner. It’s a search engine designed for logos, and we made it in a couple of weeks with a host of technologies I had never used before, including Node, MongoDB, Express, Angular, and OpenCV. This post details Token’s architecture as well as the main algorithms that make it work.
Unsurprisingly, figuring out exactly what you’re trying to solve is a good first step. I foolishly didn’t fully figure this out until partway into the project.
The question Token tries to answer is, “What logos look similar to this one?” This question came up one day while working with said partner, and it stuck in my head since it’s the kind of problem that software is good for. Speaking with designers, I’ve found that this question can mean different things to different people, and the motives for wanting to find similar logos can also vary.
So, despite being vague, “looking similar” is the key element that needs defining. What does it mean for two logos to look the same? What exactly is it about the shapes and colours of logos that give them a similar feel?
Image feature extraction and comparison
This problem exists in the general space known as Computer Vision. While I’m not an authority on the subject, OpenCV seems to be the best supported FOSS library providing a wide range of computer vision functionality. Briefly, it allows you to load, process, and extract data from images. The range of processing and data extraction abilities that it provides are quite broad, but being only concerned with logos narrows things down for us.
Deciding what features of a logo we want to capture isn’t easy. Clearly a logo is defined by a combination of colours and shapes, but how do we define them, and what should we consider important? To complicate matters, features is a bit of a loaded word in computer vision, since alone it can refer to any way of describing an aspect of an image, yet feature detection tends to refer to a particular class of algorithms. Feature detection does seems like it could be a thing that we want to do in order to recognize logos, though, and my first prototype of Token was built entirely around using the feature detection and extraction facilities provided by OpenCV.
These algorithms do pretty amazing things. Broadly, they rely on finding “distinctive” (to a computer) parts of an image like corners, edges, or blobs which are given the highly descriptive name features. After these features are found, an extraction step is performed which takes the features and transforms them into feature descriptors. These describe a feature such that it can still be recognized even if it is transformed – scaled, rotated, skewed, or translated. The details of these algorithms go well beyond what I learnt, but they let you do cool things like pick out a product from a pile of things.
Ultimately, however, I found that these algorithms were not the solution I was looking for. For one, they are designed to find an object within a larger scene, which doesn’t really describe how you’d compare two logos to one another. Second, while these algorithms are designed to be quite fast – fast enough that you can match objects in a video stream – they were still way too slow for Token. Matching a logo to another using these methods took several milliseconds. Since I knew that the logo database could be on the order of millions of logos (roughly one brand per thousand people in the world1), this would translate into a search that lasted thousands of seconds. Even distributing this search across multiple servers running the matching on GPUs, you’d still be lucky to get such a search to under 10 seconds.
Token’s main feature
Having run some experiments with several designer friends, and stared at countless logos, I realized that the problem statement needed to be defined better. Determining whether logos “look similar” can mean too many things. For instance, logos that both feature a floral motif may look “similar” to a given person despite having very different visual characteristics. Instead, the question, “what logos have a similar visual impact?” ends up being easier to solve.
This new question reveals some interesting answers. For instance, it’s unlikely that someone is going to mistake a tall, skinny logo for a short, wide one. This gives us an immediate, very cheap feature: the aspect ratio of the logo. Logos that don’t have similar aspect ratios to the one in question can simply be ignored. Better yet, this can be done while we query our database of logo features, making it super inexpensive.
This also gives us a hint to our true major feature: the shape is the key. No matter what colour your logo is, you’re not going to mistake it for another if the shape is too dissimilar. And really, the aspect ratio is just a rough proxy for the shape. While we can weed obvious misfits out with the aspect ratio, the shape is the true feature that will measure the similarity between logos.
The question is then: How do you determine the shape of a logo? As it is now, Token uses the method that I’m about to explain, but I’m going to warn you in advance: this method is suboptimal. I’ll also get around to explaining why this is, but just keep that in mind for now.
OpenCV gives us a partial answer regarding determining something’s shape. It’s able to find the contours of an image. Logos are a good candidate for contour finding, since – no surprise – they tend to have well defined contours (which can be further enhanced with edge detection). Getting the contours is only part of the solution, though. Contour detection can return multiple (potentially nested) contours. Comparing all potential contours to each other has \(O(m \times n)\) complexity, which should obviously be avoided on an operation that should be able to run at a hundred thousand times a second. Further, with no bounds on how many shapes can be returned, our logo features could take up very different amounts of memory which is not ideal.
It would be nice if we could have one contour defining the shape of each logo. OpenCV can find the convex hull of a set of points for us, but this could potentially cause these two shapes to have a near-perfect match:
Even though they obviously don’t look too similar. It would be better if we could find a concave hull defining the shape of the logo. Unlike convex hulls, finding the concave hull of a set of points is not a well defined problem, as many potential solutions can exist. One common solution to this problem is to find an alpha shape. Intuitively, the alpha shape is the shape that emerges when you roll a disk of size \(1/\alpha\) around the points, such that it does not pass through any points. Whenever the disk touches two points, you have one edge of the shape.
Alpha shapes have some problems, however. For one, if you make your \(\alpha\) too small (and thus your disk too large), you end up with a convex hull. If you make the \(\alpha\) too big, it can “fall between” a gap in points, creating multiple different contours, which is exactly the problem we want to avoid. Fiddling with an \(\alpha\) value doesn’t sound fun (although there are some advanced variants that try to dynamically determine an optimal \(\alpha\) for different regions), but this line of searching led me to find the far less well known beta shape [pdf] algorithm.
In short, the beta shape algorithm is a convenient way of finding a well defined convex hull of a set of points. The algorithm itself is interesting enough that I will save its description for another blog post, but suffice to say, it lets us get a single feature that describes the shape of a logo, that ends up looking like this for our images from before:
For logos that are segmented into different shapes, we get something like this:
By defining the shape of a logo as a single contour (as well as discarding dissimilar aspect ratios), we’re now able to perform millions of logo comparisons (with OpenCV’s
matchShapes) in a second on my six year old laptop!
Upon further reflection
If you’re like me from a couple of days ago, everything up to now has probably sounded reasonable. If, however, you know a bit about shape comparison, you’re probably shaking your head right now. That’s because you understand that
matchShapes, compares the Hu moments of contours, but Hu moments don’t depend on having a single shape – they can be calculated for any piecewise continuous function. All images fit this property, although a binary image (e.g. black outside the boundaries of the logo, and white inside) would best fit what we’re trying to accomplish. Better yet, there are some more sophisticated moments that can be used to compare shapes that can give us a better matches. Based on what I’ve seen from the Hu moment matching, it gives very close results for most simple symmetrical shapes, e.g. squares match well with circles. Since many logos are simple symmetrical shapes, this poses some problems in this domain. I see this as being the number one area in which Token can improve at the moment.
While the outer shape of our logos is a good start, there are still many other things that make logos look more or less like each other. Colours clearly play a large roll on how we perceive logos, so doing some colour matching was in order. While I started out doing some fairly standard OpenCV histogram calculation, I found that I was getting better results by compressing histogram values into a bit array, where a bit represents one of black, white, or a particular hue. The matches that resulted from this more intuitively correspond to our sense of how logo colours can be similar (e.g. if two logos are both roughly green, we’ll think of them both as “green” logos). Aside from getting better matches, this also saved on dozens of bytes per logo (i.e. dozens of megabytes in our final feature database) and allowed for a very fast XOR-based colour comparison.
Since it is common to render the transparent parts of logos on a white background by default, white is a special colour. Since the amount of white space is important for the feel of the logo (e.g. is it blocky and bold, or light feeling), white is the only colour that I keep the full (normalized) value for, making it possible to directly compare the degree of whitespace between logos.
Further, while matching the outer shapes of logos is good, it would be nice to distinguish between, say, a circle with a square inside of it and a circle with a circle inside of it. Since OpenCV returns a hierarchy of contours, we can do just this. By looking exclusively at shapes that are not at the top-level of the hierarchy, we can get a sense of what’s going on in the interior of the logo. Token takes these interior shapes (at least ones that are larger than a certain amount) and does the same beta-shape finding on them. Not every logo has a distinctive interior, of course, so if our interior looks like the exterior, we simply get rid of it.
Having these shapes gives us some bonus data, too. For instance, just the raw number of points tells us something about the complexity of the logo (given that all logos are scaled to be the same size before processing). Perhaps more interesting is a feature that I’ve called the smoothness of the logo. When OpenCV finds contours, it gives back a list of points that are all separated by one pixel. This is obviously going to be frequently redundant, as any straight lines can be described by only two points. Accordingly, OpenCV also provides a function,
approxPolyDP, that returns a simplified version of the contour, using only as many points as necessary to fully describe it within a given tolerance. The smoothness is the number of points on this simplified contour, divided by the length of the perimeter. The greater this value, the more densely clustered the points are, and thus the resulting shape is going to appear more contoured or smoother. Inversely, a very low value, like what you’d get for a triangle, indicates a “sharp” shape.
Once all of these features are decided upon, all that’s left is to build a distance function: a function that takes in two sets of features and returns the distance between them. A distance of zero indicates that the features are identical (and thus, presumably, the logos). Greater distances indicate less and less similar logos.
For Token’s distance function, I wanted to have strong matches on any of the outer beta shapes that match well – since it’s these shapes that describe the greatest visual impact of the logo. Other features such as colour should influence the order of the results, but they shouldn’t be enough to eliminate a good result. For this reason, I grouped the feature distances in two: the outer shape, and everything else. The outer shape is given the most weight, while everything else is able to scale that score up and down.
Matching and Serving
Once the image feature extraction and comparison functions are worked out, things get fairly ordinary. I opted to try out the trendy MEAN stack for Token. My first step was to use Express to create an API with which I could organize logos. The (REST) API has operations for adding, modifying, deleting, and querying logos.2 When a logo is added, the image is first resized and stored, then the feature extraction method is called (which is implemented as a Node addon in C++). The features are stored in the database, along with other information about the logo, which completes the operation.
A search operation is fairly similar. Calling the search method results in the logo in question being downloaded onto the server where the same feature extraction process takes place. In this case of a search, though, these features are then compared against the features in the database using the distance function, and the closest matches get returned.
The server is also used to serve up Token’s front end. All of the front end work happens at the client side, where Angular interacts with the API to assemble the content that gets rendered.
Token is hosted as an Amazon Elastic Beanstalk instance. While this style of hosting offers a lot of perks, getting it set up for Token took a bit of work. For a Node app, EB does the following: it sets up your environment (based on your configuration settings), it unpacks your application, runs
npm install, then
npm start. For most apps this makes things easy. For apps that need to compile a custom extension, getting the environment in a state that you can run those latter two commands is the fun part. The rest of this section is some fairly application-specific detail on how I set things up, so feel free to skip ahead unless you’re into this.
The big hurdle came from the fact that Token has C++ code that needs to be compiled upon deployment. This code features libraries that are not often packaged (OpenCV and the MongoDB C++ driver), so getting these dependencies ready was the first step. This was done by compiling them on an EC2 instance, then compressing the output files into an archive, which were then put into an S3 bucket (so much Amazon jargon!).
Getting these compiled files onto an instance proved to be easy after this. Elastic Beanstalk environments are customized through configuration files which provide various directives for performing different tasks. The
sources directive allows you to point at an external file that you want to be installed into a given directory. When the file is an archive, it is automatically expanded.
There was still plenty more to do, though. For instance, Token uses Compass to compile its SCSS, which gets installed (via the
rubygems config directive) into a directory that isn’t in the application’s path, so some linking was in order, also done in a config file.
MongoDB takes another config file to get installed, and there’s another file that sets some permissions, as well as points Elastic Beanstalk’s proxy server to the public folder. Finally, there’s a
post-install command that runs only on AWS to pre-compile the SCSS, and move some resources so the proxy server can find them.
This might sound like a lot of work to get things set up, and it was. There were a couple of gotchas that slowed me down a good deal, too. The first was that Elastic Beanstalk will try to run the file
npm start. While I had a file called
app.js (created by Express’s application generator tool), this was not the proper entry point of the application. The solution was to rename this file so EB would run
npm start as we wanted.
The other trick was figuring out how to set
npm to run with “unsafe permissions”, which was necessary in order to compile the addon via
npm install. While
npm ostensibly has four different ways of setting this option, the only one that worked for the version that EB uses was the least well documented way. The solution was to create an
.npmrc file with the line
unsafe-perm = true in it. After doing so,
npm install was able to compile my addon without problems.
If you want to see what went into my EB environment configuration, you can check it out in the source.
At the moment Token is still a proof of concept. For starters, the results that get returned are okay, but they certainly could be a lot better. Simple, very symmetrical shapes match well will other simple symmetrical shapes (like squares matching with circles), which causes a bit too much noise in certain results. The OpenCV contour matching algorithm, which compares Hu moments is weak in this case. I have hopes that moving to a different shape matching algorithm, such as using Zernike moments, will improve upon this.
There’s also lots of cool areas that could be expanded into, such as using machine learning to improve upon the feature distance calculation (right now it uses a bunch of poorly hand-tuned constants), or distributing the searches to speed things up. There’s also the possibility that peoples’ searches could be used to forecast the direction in which logos are trending, which I feel is an interesting prospect.
There’s also some simpler ways in which Token could improve. Simply upgrading the type of EC2 instance it’s running on should drastically speed it up, for starters (there’s a 10x slow down between my laptop and the micro instance that Token is running on). Interface-wise there’s lots that could be done to provide more value, such as being able to narrow down search results to particular categories (e.g. I only want to see logos from sports teams), and providing more detailed information about logos and their associated organization.
This is an entirely unsupported number, but I think it’s the right order of magnitude. Small business (less than 100 employees) make up the majority of jobs, at least in North America, so it might be tempting to say that there would be on the order of 1 brand per 100 people. However, since many of these businesses are either short lived, limited to a very small market area, or not branded, I’d estimate that you won’t find more than 1 “meaningful” brand per 1000 people.↩
The API is a bit more complex than this, since it also represents organizations. Each organization has a list of logos that belong to it, as well as some other data.↩