marcus welz

Finding Duplicate or Similar Images with Perceptual Hashing in PHP

Posted on November 22, 2010

A while ago on Reddit someone asked how Tineye works. It's pretty fascinating; you upload a photo (or point it at a URL of an image) and it'll find other locations with similar images — if they've been indexed. Even if those images are in different sizes, or have had minor changes made to them, be it due to compression or because someone added or removed some text. So in a way, it's a fuzzy image based search engine.

Although I'm sure tineye has it's own set of algorithms and custom applications to drive all this, something similar, if crude in implementation, can be achieved with available software.

At the center of all this is an image hashing algorithm. Usually (cryptographic) hashes are designed to detect even the slightest modifications and return a completely different hash. We're looking for the opposite, and libphash delivers:

The phash library implements a "perceptual hashing" algorithm. From their site:

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar.

So the idea is that you feed it an image, and it'll return a hash, and the similar two hashes are to one another, the more identical the images are. The comparison of similarity is done by calculating the hamming distance, which in a way is the bit-level version of the levenstein() function that PHP developers may already be familiar with.

Even better, the phash library comes with PHP bindings, and provides a few functions to get you started. For instance, there's the ph_image_hash() function. Simply give it a filename of an image, and it'll return, well, a resource. Now this really puts a damper on the usefulness of that function, since resources are fairly opaque and hard if not impossible to work with.

Fear not, I've made a few changes so that ph_image_hash() returns a plain string with the hexadecimal representation of the hash, which can then be stored in a database, for instance. You can grab phash from my github repository.

Alright, you've got a way to get to those hashes, now how does one index them and look them up in a speedy way? Well, this is where it gets interesting, and unfortunately a little bit theoretical.

Ideally, you'll store these hashes (and other meta data) in a database, but not a SQL database. You really want a vantage point tree, or better, multiple vantage point tree. Essentially these are binary trees that build up based on the distance of hashes. The idea is that hashes that are similar are close together. So you traverse the tree in order to get close to your match, and then just "look around" in that area and you'll likely find similar results.

The MVP tree area gets pretty academic, and from what I've looked at so far, most of it is theoretical, presented as limited in applicability, but at the same time seems to be exactly what such a fuzzy image search engine would need to be based on. I'm fairly certain companies working on various aspects of image recognition and augmented reality and what not are all messing with this sort of thing, so there's likely very little incentive for them to publicize or advertise their algorithms.

The phash library does include a "MVPTree" library with basic examples of how data is stored. Having something like this built out into a scalable data store with an HTTP interface a la SOLR would be fantastic.

I'd immediately work on a PHP application to index my photos, detect duplicates, etc.

Print This Post Print This Post
Filed under: PHP No Comments

HTML5 Game - Connecting to the server

Posted on October 17, 2010
This entry is part 7 of 7 in the series HTML5 and Erlang Game Development

I'm prototyping a multiplayer HTML5 game. The client is the HTML5 concoction of JavaScript, CSS, and whatever one might lump into the "HTML5" technology stack, while the server is the "concurrency oriented" Erlang and mochiweb stack.

The browser will then communicate with the web server using AJAX calls. Eventually this could all be done using WebSockets, but we're not there yet. WebSockets have just experienced a set back due to a problem with the protocol.

With simple AJAX calls (which aren't really AJAX anyway, since JSON is so much nicer to work with) there are two ways to get data from the server.

One is polling. Often. Every second or so. This is rather wasteful, but it's easy to implement. In PHP, for instance, polling often is pretty much the only option since the connection to the server ties up an Apache child process. The PHP portions needs to be as quickly as possible in order to free up the resources for the next person to be able to bang away at the server.

The other option is long-polling (AKA COMET). In this case the browser connects to the server, which then holds onto that connection until data is ready to be sent back. This eats the socket, and whatever additional resources are required by the webserver. And since Erlang appears to be really good at this, I'll do that.

So for the purpose of communicating with the server, I'll use two types of connections. One is the "event listener", waiting for data coming from the web server, that's my long-polling COMET call. Whenever there's data the server wants me to have, it'll send it and then close the connection. On the client side I'll process it, and then immediately fire off the next COMET call, waiting for more data. Mochiweb actually comes with an example of long-polling.

The other connection type is the "action connection". It'll fire for every command that the client is sending to the server. So when I click something in the browser, that'll result in an action call to the server. For that I'll use the regular jQuery .ajax() call.

So essentially to receive data from the server I have my listener AJAX call, and to send data to the server I use my action AJAX call.

Erlang Processes

I'll need an Erlang process that represents my client. It stores my session data, but a bit more than that, actually. I'll basically store all relevant client view state and interact with the rest of the server on the browser's behalf.

So for each client there's a session process. This process gets created when the client makes a login call:

Once this Session process exists, the listener will connect to the server. The server will hold onto the connection, until data is available. The data is then returned to the browser:

Finally, the action AJAX call. It just fires off an action and closes the connection. The Listener Pid here is the hibernating mochiweb process from the previous sequence diagram:

It's rather quick, too. The first feature I implemented was the "ping". JavaScript sends a ping command, along with the current time.

metapoly.net =
{
    action: function(action, params) {
        $.post('/action/' + action, params);
    },
};

metapoly.net.action("ping", {time: (new Date()).getTime()});

The AJAX call will connect to the mochiweb server, which creates a new process to handle the request. The process will then look up which "client view" process is responsible for handling the web browser commands, this is stored in ets. The "client view" process is then sent the ping message. The client view process pattern matches it, and tries to send a response "pong" to the listener process, which is the COMET connection.

% actions such as mouse clicks, text events, etc.
handle_action(Payload, State) ->
    case Payload of
        {"ping", Params} ->
            Pong = {struct, [{'_t', mpu:now()}, {"c", "pong"}|Params]},
            io:format("received ping, queueing pong reply: ~p ~p~n", [Params, Pong]),
            NewState = State#state{messages = [Pong | State#state.messages]}
    end,
    ?MODULE:loop(NewState).

As soon as the listener has the command, it'll return it to the client. The JavaScript client will invoke the callback, get the new current time, do a diff with the time the ping was sent and displays the results in a div tag.

metapoly.net.e('pong', function(msg) {
    var diff = (new Date()).getTime() - msg.time;
    $('#ms').text(diff + "ms lag");
});

On localhost, and using Chrome, the fastest I've seen was a 6ms response time. If I hammer the button, it'll come back somewhere between 8 - 20ms. On Firefox, with Firebug open, it'll bounce around between 30ms and 150ms.

With that, the consider the first conversation with the server successful.

Print This Post Print This Post

HTML5 Large Random Map test

Posted on October 9, 2010
This entry is part 6 of 7 in the series HTML5 and Erlang Game Development

This time it's building random patches. And I've added a layer for sprites. There's two of them and they're static, so again it's not too exciting. I did notice, however, that I'm running into serious performance issues when building these patches. So this technique isn't going to fly.

Maybe it's faster to farm out the work to web worker threads, but they don't have access to the DOM and can't use any HTML elements. So I'd have to pass everything as an array (from getImageData()) , let the web worker assemble the patch, which is then handed back as an array. I haven't tried that yet.

The other possibility is to have the server generate the patches. Google maps style. The immediate downside is that I'm no longer saving bandwidth, since the same tiles that were previously displayed on the client are now generated by the server. Maybe patches could be hashed and different areas of the map could be served by the same image, if the hash matches. That would also involve some Erlang. So far it's all just been JavaScript.

Print This Post Print This Post

HTML5 Map Tile test

Posted on September 25, 2010
This entry is part 4 of 7 in the series HTML5 and Erlang Game Development

My next thing was trying to build a simple tile-based terrain. I don't really do graphics, so I borrowed tiles from Zelda. RPGmaker related sites have a ton of tile sets. Some more usable than others.

These tiles are 16×16. I wanted something with dimensions of power-of-two. Just in case this thing escalates into a native client or, more likely, using WebGL. The very first thing is just having a bunch of tags arranged as map tiles.

There isn't all that much going on here. A bit of CSS to define the map tiles along with a snippet of JavaScript to just render a bunch of div tags with random map tiles.

I used CSS to build a bunch of classes, one for each tile in the tileset to minimize the amount of specifications that were needed in the div tags' style attribute. The only thing that I wanted in there was top, left, and (later) z-index. Everything else goes into the class.

I don't even bother testing in IE at the moment. I do the development in Firefox (due to Firebug and all), and test in Chrome (as a best-case scenario test, I guess).

Alright, so rendering a static map is neither hard nor exciting. Having the ability to scroll the thing around a bit would be nice. So that's next.

Print This Post Print This Post

HTML5 - Canvas Clickmap Test

Posted on September 18, 2010
This entry is part 3 of 7 in the series HTML5 and Erlang Game Development

So, one of the prototypes was to test using canvas to build a clickmap that detects whether a mouse click occurred on a visible pixel, rather than a transparent part of the image. It's one of the techniques presented in the Building a game engine with jQuery that was used in the Google Tech Talk.

I looked at the slides for the code (it's on slide #50). Simple stuff, it needed some tweaks though. The idea is that if the pixel is transparent, the click shouldn't be registered as such. Essentially it means extracting the alpha channel from the image data, storing it in an array and using it to scan for click hits.

And this is pretty much the first step it took for me to get a taste of what's ahead. I'm no stranger to game development, the web, JavaScript, nor the canvas tag, but I was intrigued and curious.

The embedded page on the right with the character sprites (borrowed from Secret Of Mana) is the clickmap test. Try it yourself. Only clicks on colored pixels should register and result in a JavaScript alert.

The next thing I was wondering about is how they'd handle overlapping images, and detecting which one was clicked. The first thought was DOM event bubbling, because the tech talks mentioned heavy use of delegations, but that makes no sense since the various sprites are siblings, and not arranged in any kind of hierarchical way.

The obvious answer is to quickly iterate over all the sprites (maybe optimize with quadtree), do collision check with the mouse click via the clickmap, and give precedence to the top-most element.

So now I need a playground where I can put these sprites. The next experiment will be something map related, how hard can it be…

Print This Post Print This Post