marcus welz

Quick and dirty sometimes is better

Posted on March 12, 2011

First when I threw together the Metapoly project, I spent some time getting the long-polling AJAX pieces right, followed by quite a bit of Erlang code to get the simulation working. I had lots of io:format() calls dumping out various bits of data. The simulation seemed to work right, but I had to visualize everything in my head, because the missing piece was the client to handle the rendering.

The key component on the server side that lets the browser communicate with the game engine is the client view. Essentially it's a component that represents the client on the server. It keeps track of the client state and marshals messages back and forth.

Communication is done using two types of AJAX calls. One is the listener, and another sends each action performed on the client to the server via a separate call. Keeping them independent was to make each more responsive.

Because I was so eager to get something work and on the screen, the client_view component was just thrown together. There's no hint of OTP anywhere. There's not even really a decent API. Other processes literally just use the bang syntax to send messages, hardcoded in whatever module they originate from.

Right now, though, there's some bug resulting in crash dumps. It takes a while, and lots of traffic, in order for it to occur, but when it does, werl.exe is already using up 900mb+, and the's some kind of heap/stack memory allocation issue in the client_view component. I couldn't find anything obvious in the module, but I thought maybe I screwed up some tail recursion call somewhere. The crash dump mentions something about "old_heap", and googling for more info on it has been fairly fruitless. A few posts mention something vague from folks that write Wings3D plugins. but I couldn't narrow anything down just yet. Either way, I figured I'd rewrite the module. Might as well make it proper, and OTP compliant.

So after a few hours of rigging, and also improving the components behavior, it now doesn't scale anywhere near as well as the previously hacked together one. Fantastic.

Originally, the client_view process, of which there is exactly one per each user "session", would queue up messages from the game world. When a listener (that long-polling AJAX call) finally connected, all the pending messages would get sent. I did want to add in some kind of mechanism to ensure broken connections, however, so that if an AJAX call somehow gets interrupted, the next AJAX call could re-retrieve the same messages again.

I half-implemented this by having the client-side record the timestamp of the most recent message it received, and having it send that timestamp with the next comet request. The idea was to have the client_view use that timestamp to acknowledge the client's receipt and prune all those old messages, and then send any pending new messages. But it turns out that I had never gotten around to adding that logic. Indeed, I was simply sending messages to the listener process, and then no longer tracked the listener as part of the client_view state, knowing that the data would immediately be sent back to the client anyway, and that the listener would go away.

I didn't like that approach, though, because the client_view is making assumptions about the listener. And if the listener was a web socket, the assumption had been incorrect. So instead, I inverted the logic. The client_view keeps sending data to the listener, and the listener itself would notify the client_view that it was going away, and unregistered from the client_view process. This, however, adds overhead. And I also added the ability to have a several listeners registered waiting for messages. This probably also adds overhead, although it shouldn't be that much.

Finally, because I figured that one listener acknowledging receipt of a message doesn't mean another listener (heh, why?!) didn't still need that message. So, I essentially over-engineered this beast a bit. And it's very visible. Although the implementation looks cleaner, the approach isn't as fast. But to some extent it unveils protocol issues that would appear in a somewhat laggy environment. There's no client side interpolation / smoothing of events.

For now the solution is to remove the multiple-listener feature, but ultimately, I think this lays bare some of the client-side tweaks that need to be added before this becomes as smooth as I want it to be in order to work with unknown latencies. Either way, this change is gonna stay in its branch for now.

Print This Post Print This Post
Filed under: Gaming, Social Web No Comments

Finally bought a Solid State Drive

Posted on December 8, 2010

As an early Christmas present to myself, I took the plunge and finally bought two small 2.5" 60GB solid state drives. They were quite a bit smaller and lighter than I imagined. But in terms of performance, there's quite a difference.

I normally run everything RAID-1, but since everything important is now stored on RAID-1 backed NAS if not Amazon's S3, I figured I could take a little risk for the desktop and run the SSDs in RAID-0. I did some benchmarks to compare. An old 300GB ATA drive did around 50MB/s. My previous 1TB SATA RAID-1 setup performed somewhat better at around 70MB/s. The RAID-0 SSD setup wins by a long shot, with around 320MB/s.

This setup reduces my desktop drive space quite a bit, but that's a good thing. Perhaps it'll be more of an incentive to keep everything neatly on the NAS devices.

Print This Post Print This Post
Filed under: Hardware No Comments

Finding Duplicate or Similar Images with Perceptual Hashing in PHP

Posted on November 22, 2010

A while ago on Reddit someone asked how Tineye works. It's pretty fascinating; you upload a photo (or point it at a URL of an image) and it'll find other locations with similar images — if they've been indexed. Even if those images are in different sizes, or have had minor changes made to them, be it due to compression or because someone added or removed some text. So in a way, it's a fuzzy image based search engine.

Although I'm sure tineye has it's own set of algorithms and custom applications to drive all this, something similar, if crude in implementation, can be achieved with available software.

At the center of all this is an image hashing algorithm. Usually (cryptographic) hashes are designed to detect even the slightest modifications and return a completely different hash. We're looking for the opposite, and libphash delivers:

The phash library implements a "perceptual hashing" algorithm. From their site:

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar.

So the idea is that you feed it an image, and it'll return a hash, and the similar two hashes are to one another, the more identical the images are. The comparison of similarity is done by calculating the hamming distance, which in a way is the bit-level version of the levenstein() function that PHP developers may already be familiar with.

Even better, the phash library comes with PHP bindings, and provides a few functions to get you started. For instance, there's the ph_image_hash() function. Simply give it a filename of an image, and it'll return, well, a resource. Now this really puts a damper on the usefulness of that function, since resources are fairly opaque and hard if not impossible to work with.

Fear not, I've made a few changes so that ph_image_hash() returns a plain string with the hexadecimal representation of the hash, which can then be stored in a database, for instance. You can grab phash from my github repository.

Alright, you've got a way to get to those hashes, now how does one index them and look them up in a speedy way? Well, this is where it gets interesting, and unfortunately a little bit theoretical.

Ideally, you'll store these hashes (and other meta data) in a database, but not a SQL database. You really want a vantage point tree, or better, multiple vantage point tree. Essentially these are binary trees that build up based on the distance of hashes. The idea is that hashes that are similar are close together. So you traverse the tree in order to get close to your match, and then just "look around" in that area and you'll likely find similar results.

The MVP tree area gets pretty academic, and from what I've looked at so far, most of it is theoretical, presented as limited in applicability, but at the same time seems to be exactly what such a fuzzy image search engine would need to be based on. I'm fairly certain companies working on various aspects of image recognition and augmented reality and what not are all messing with this sort of thing, so there's likely very little incentive for them to publicize or advertise their algorithms.

The phash library does include a "MVPTree" library with basic examples of how data is stored. Having something like this built out into a scalable data store with an HTTP interface a la SOLR would be fantastic.

I'd immediately work on a PHP application to index my photos, detect duplicates, etc.

Print This Post Print This Post
Filed under: PHP 2 Comments

HTML5 Game - Connecting to the server

Posted on October 17, 2010
This entry is part 7 of 7 in the series HTML5 and Erlang Game Development

I'm prototyping a multiplayer HTML5 game. The client is the HTML5 concoction of JavaScript, CSS, and whatever one might lump into the "HTML5" technology stack, while the server is the "concurrency oriented" Erlang and mochiweb stack.

The browser will then communicate with the web server using AJAX calls. Eventually this could all be done using WebSockets, but we're not there yet. WebSockets have just experienced a set back due to a problem with the protocol.

With simple AJAX calls (which aren't really AJAX anyway, since JSON is so much nicer to work with) there are two ways to get data from the server.

One is polling. Often. Every second or so. This is rather wasteful, but it's easy to implement. In PHP, for instance, polling often is pretty much the only option since the connection to the server ties up an Apache child process. The PHP portions needs to be as quickly as possible in order to free up the resources for the next person to be able to bang away at the server.

The other option is long-polling (AKA COMET). In this case the browser connects to the server, which then holds onto that connection until data is ready to be sent back. This eats the socket, and whatever additional resources are required by the webserver. And since Erlang appears to be really good at this, I'll do that.

So for the purpose of communicating with the server, I'll use two types of connections. One is the "event listener", waiting for data coming from the web server, that's my long-polling COMET call. Whenever there's data the server wants me to have, it'll send it and then close the connection. On the client side I'll process it, and then immediately fire off the next COMET call, waiting for more data. Mochiweb actually comes with an example of long-polling.

The other connection type is the "action connection". It'll fire for every command that the client is sending to the server. So when I click something in the browser, that'll result in an action call to the server. For that I'll use the regular jQuery .ajax() call.

So essentially to receive data from the server I have my listener AJAX call, and to send data to the server I use my action AJAX call.

Erlang Processes

I'll need an Erlang process that represents my client. It stores my session data, but a bit more than that, actually. I'll basically store all relevant client view state and interact with the rest of the server on the browser's behalf.

So for each client there's a session process. This process gets created when the client makes a login call:

Once this Session process exists, the listener will connect to the server. The server will hold onto the connection, until data is available. The data is then returned to the browser:

Finally, the action AJAX call. It just fires off an action and closes the connection. The Listener Pid here is the hibernating mochiweb process from the previous sequence diagram:

It's rather quick, too. The first feature I implemented was the "ping". JavaScript sends a ping command, along with the current time.

metapoly.net =
{
    action: function(action, params) {
        $.post('/action/' + action, params);
    },
};

metapoly.net.action("ping", {time: (new Date()).getTime()});

The AJAX call will connect to the mochiweb server, which creates a new process to handle the request. The process will then look up which "client view" process is responsible for handling the web browser commands, this is stored in ets. The "client view" process is then sent the ping message. The client view process pattern matches it, and tries to send a response "pong" to the listener process, which is the COMET connection.

% actions such as mouse clicks, text events, etc.
handle_action(Payload, State) ->
    case Payload of
        {"ping", Params} ->
            Pong = {struct, [{'_t', mpu:now()}, {"c", "pong"}|Params]},
            io:format("received ping, queueing pong reply: ~p ~p~n", [Params, Pong]),
            NewState = State#state{messages = [Pong | State#state.messages]}
    end,
    ?MODULE:loop(NewState).

As soon as the listener has the command, it'll return it to the client. The JavaScript client will invoke the callback, get the new current time, do a diff with the time the ping was sent and displays the results in a div tag.

metapoly.net.e('pong', function(msg) {
    var diff = (new Date()).getTime() - msg.time;
    $('#ms').text(diff + "ms lag");
});

On localhost, and using Chrome, the fastest I've seen was a 6ms response time. If I hammer the button, it'll come back somewhere between 8 - 20ms. On Firefox, with Firebug open, it'll bounce around between 30ms and 150ms.

With that, the consider the first conversation with the server successful.

Print This Post Print This Post

HTML5 Large Random Map test

Posted on October 9, 2010
This entry is part 6 of 7 in the series HTML5 and Erlang Game Development

This time it's building random patches. And I've added a layer for sprites. There's two of them and they're static, so again it's not too exciting. I did notice, however, that I'm running into serious performance issues when building these patches. So this technique isn't going to fly.

Maybe it's faster to farm out the work to web worker threads, but they don't have access to the DOM and can't use any HTML elements. So I'd have to pass everything as an array (from getImageData()) , let the web worker assemble the patch, which is then handed back as an array. I haven't tried that yet.

The other possibility is to have the server generate the patches. Google maps style. The immediate downside is that I'm no longer saving bandwidth, since the same tiles that were previously displayed on the client are now generated by the server. Maybe patches could be hashed and different areas of the map could be served by the same image, if the hash matches. That would also involve some Erlang. So far it's all just been JavaScript.

Print This Post Print This Post