marcus welz

Android: Ducking Audio with TextToSpeech

Posted on January 22, 2012

The Android app I'm working on makes use of TextToSpeech. It's also very likely that the user is playing music in the background. In order for the music to not drown out the TTS output, we'll make use of audio ducking, a feature that's "new" as of API level 8 (Android 2.2).

The ingredients needed for this is an instance of the TextToSpeech class and the AudioManager.

In order to get spoken text, we use TextToSpeech.speak(String text, int mode, HashMap<String,String> params);

Now, in order to properly get any background music to duck when we start speaking, and raise the volume when we're done speaking, we'll need to know when the TTS is done blabbering. And for this, we can register a callback via TextToSpeech.setOnUtteranceCompletedListener(). There are, however, a few pitfalls that we'll need to watch out for:

  1. TextToSpeech.setOnUtteranceCompletedListener() MUST be invoked after onInit() was called. Otherwise it won't necessarily register properly, and onUtteranceCompleted() may never be called.
  2. When invoking TextToSpeech.speak(), we must provide an "Utterance ID" via the params HashMap using TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID. If there's no ID, onUtteranceCompleted() will not be invoked.
  3. Because it is possible to queue up more utterances via TextToSpeech.speak() before the previous one has completed, we'll need to track how many we've queued up, so we know to only release audio focus after all of them have been completed. Otherwise we've got a race condition where we queued up two things to speak, but lose focus after the first one has completed. The result is a mess where music will cut in and out while text is being spoken.

With this in mind, we'll create a wrapper class which will encapsulate this functionality.

public class DuckingTTS {

    // Log tag
    private final String TAG = DuckingTTS.class.getName();

    // Debugging is on (set to false to cut down on spam)
    private static final boolean D = false;

    /**
     * Which audio stream to use. We use the music one here,
     * so that it'll be "in line" with the music volume from whatever is playing.
     */
    private final int STREAM_TYPE = AudioManager.STREAM_MUSIC;

    /**
     * Parameters we're feeding with each invocation of speak()
     */
    private HashMap ttsParams;

    /**
     * How many utterances are playing at a particular moment.
     */
    private int mUtterancesPlaying = 0;

    /**
     * The text-to-speech engine
     */
    private TextToSpeech mTts;

    /**
     * Whether TTS is initialized (onInit() called yet?)
     */
    private boolean mIsInitialized = false;

    /**
     * Queue up chatter when TTS is not initialized yet
     */
    private List queue = new ArrayList();

    /**
     * AudioManager injected via RoboGuice
     */
    @Inject
    private AudioManager mAm;

    /**
     * Context will be injected via RoboGuice
     * @param context
     */
    @Inject
    public DuckingTTS(Context context) {

         ttsParams = new HashMap();
         ttsParams.put(TextToSpeech.Engine.KEY_PARAM_STREAM, String.valueOf(STREAM_TYPE));
         ttsParams.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "ID");

         mTts = new TextToSpeech(context, ttsOnInitListener);
         mIsInitialized = false;
    }

    public boolean speak(String text) {

        if (!mIsInitialized) {
            if (D) Log.d(TAG, "speak(\"" + text + "\") - queued");
            queue.add(text);
            return false;
        }

        // Tell it to speak
        if (D) Log.d(TAG, "speak(\"" + text + "\") - sending to TTS");
        mTts.speak(text, TextToSpeech.QUEUE_ADD, ttsParams);

        if (mAm == null) return true;

        // if this is the first utterance (e.g. we're not already talking) then request audio focus w/ducking.
        if (mUtterancesPlaying < 1) {
            int status = mAm.requestAudioFocus(audioFocus, STREAM_TYPE, AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK);

            if (status == AudioManager.AUDIOFOCUS_REQUEST_FAILED) {
                Log.e(TAG, "speak() audio focus request failed.");
            }
        }
        mUtterancesPlaying++;

        return true;
    }

    public void shutdown() {
        mTts.shutdown();
    }

    private TextToSpeech.OnInitListener ttsOnInitListener =
            new TextToSpeech.OnInitListener() {

        /**
         * Callback for when the TextToSpeech engine was initialized.
         * Result will tell us whether this was successful or not.
         *
         * @param status
         */
        public void onInit(int status) {

            if (status != TextToSpeech.SUCCESS) {
                return; // we just abort on failure, it's never fully initialized
                // this can be bad, by the way, because every speak() call will now add something to the queue.
            }

            mIsInitialized = true;
            mTts.setOnUtteranceCompletedListener(ttsOnUtteranceCompletedListener);

            // Also speak anything that was queued up so far.
            for (String text : queue) {
                speak(text);
            }
        }

    };

    private TextToSpeech.OnUtteranceCompletedListener ttsOnUtteranceCompletedListener =
            new TextToSpeech.OnUtteranceCompletedListener() {

        /**
         * Callback when TTS has completed an utterance.
         */
        public void onUtteranceCompleted(String utteranceId) {
            if (D) Log.d(TAG, "onUtteranceCompleted(\"" + utteranceId + "\")");
            mUtterancesPlaying--;

            if (mAm == null) return;

            // once we're done speaking, lose audio focus.
            if (mUtterancesPlaying < 1) {
                mUtterancesPlaying = 0;
                mAm.abandonAudioFocus(audioFocus);
            }
        }

    };

    private AudioManager.OnAudioFocusChangeListener audioFocus =
            new AudioManager.OnAudioFocusChangeListener() {

        public void onAudioFocusChange(int focusChange) {

            // I don't think we actually care.
            if (D) Log.d(TAG, "onAudioFocusChange(" + focusChange + ")");

        }
    };

}

That's it. I use it in a service, and I wire it in with a simple annotation using RoboGuice:

public class GpsService extends RoboService {

    @Inject
    private DuckingTTS mDuckingTTS;

    // ... rest of the class implementation ...
}

As the final disclaimer: The class works for me and my purposes at the moment, but it doesn't handle every error scenario. Also, don't forget to call shutdown() to release the TTS resources.

Print This Post Print This Post
Filed under: Android No Comments

Android: Getting started with RoboGuice 2.0 (beta 3)

Posted on January 8, 2012

When I started messing around with Android, it consisted mostly of copying and pasting example code together to quickly get some results. That works, but the unfortunate side effect is that the Activity or Service class balloons out with functionality and features that are better off encapsulated according to proper object oriented concepts and best practices and what not.

However, once I started more time modeling classes I realized that there are an aweful lot of cases where you'll need to pass around contexts in order to get access to service providers. AudioManager for TextToSpeech, LocationManager for GPS, SensorManager for Accelerometer information, PowerManager for wake locks, just about anything worth doing required accessing a service provider. So as I started encapsulating functionality in classes I wasn't sure how to best go about initializing them. Do I keep passing around the context via the constructor, and provide setters and getters to inject mock services for unit testing? Do I use factories?

Luckily, I stumbled across RoboGuice, which extends Google's Guice dependency injection framework. Although the current "production ready" version of RoboGuice is at 1.1 (and uses Guice 2.0), RoboGuice version 2.0 uses Guice 3.0, and was simpler to set up — because it doesn't need a custom Application class. I'm all about simplicity (everything should be made as simple as possible, but not simpler).

Quick note here, I'm writing this from the point of retrofitting it to an already existing application. So I already have my application set up. I just want to take advantage of RoboGuice now to simplify it a bit.

Why?

Alright, so a good first question is, what the heck is RoboGuice, and why do I want to use it?

Essentially, RoboGuice is a dependency injection framework and allows for inversion of control. That's almost saying the same thing in two different ways. If that doesn't tell you anything, you should read up on those concepts. It would be silly for me to explain it here, since there are far better resources for that out there. In a nutshell, it helps streamline how objects are wired together by convention and configuration, allows for better separation of concerns, and, a very first and easy benefit to grasp, it reduces the amount of boiler plate code that needs to be written.

Take a look at A Simple Example that RoboGuice provides.

So right off the bat, their example shows that it's dead simple to wire up view object and system service providers, simply using annotations.

How?

As I said, I'm not even bothering with RoboGuice 1.1. Upgrading to RoboGuice 2.0 is explained on the RoboGuice Wiki. If you're completely new to it, however, it can be a bit overwhelming. To start from scratch, you need the following:

  1. Download the latest RoboGuice 2 snapshot. Currently, that's version 2.0b3. You'll want to drop this into your projects "/libs/" directory, which is where most other JAR files go as well if you use any (e.g. the fragments support backport android-support-v4.jar, or the Google Maps maps.jar, etc.)
  2. Download Guice 3.0. You want the guice-3.0-no_aop.jar. Again, this goes into the "/libs/" directory of your project.
  3. Not immediately obvious is that you'll also want to grab the guice-3.0.zip, because you need the javax.inject.jar from it. Yes, also goes into the "/libs/" directory.
  4. The JARs need to be added to your project, so in Eclipse, go to the Project menu, Properties, Java Build Path, Libraries tab, now "Add JARs", and add all three JARs (guice-3.0-no_aop.jar, javax.inject.jar, and roboguice-2.0b3.jar).

Okay, now your project has RoboGuice added to it, but nothing is using it yet.

Putting it to use

One of the first things you'll want to do is go into one of your application's existing Activities. If you're doing it the simple / old way your class probably just extends Activity. Just change it so it extends RoboActivity instead. If you're using fragments and your activity is a FragmentActivity class, just change it to be RoboFragmentActivity. If you're using any services, and have a  class that extends Service, modify the class to extend RoboService instead.

Then go through your onCreate() methods, rip out the findViewById() calls, and replace them with @InjectView annotations in front of your property declarations, it's easy to just check A Simple Example for reference again.

Instead of a setContentView() call in onCreate(), you can use the @ContentView(R.layout.layoutname) annotation right before your class definition.

For example:

@ContentView(R.layout.record)
public class RecordActivity extends RoboFragmentActivity
{
    @InjectView(R.id.txtDistance)	TextView txtDistance;
    @InjectView(R.id.txtTime)		TextView txtTime;
    @InjectView(R.id.txtPace)		TextView txtPace;
    @InjectView(R.id.btnStart)		Button btnStart;
}

I hope that helps you get started quickly and painlessly.

Print This Post Print This Post
Filed under: Android, Development 1 Comment

Let's try this again

Posted on January 1, 2012

Well, 2011 turned out to be a year of few blog entries. I tend to fall into a pattern where I go "oh yeah, this time I'll totally blog consistently", but that excitement dies down quickly. I think it's because much of what I do is what I'd describe as dabbling. I'll pick a technology, play with it, and prototype something. And then there are typically three outcomes; I get bored with it, my prototype is good enough for me (but in my opinion not worth showing off), or I realize that I lack the resources to bring the project to completion,  and so I move on. Some projects are forgotten, others might get revisited later.

At the moment I'm dabbling with an Android project, and for now both personal interest and technical feasibility allow me to push this forward to see where things go. I'm hoping that the one or other component or aspect is something I can share here, because otherwise I should just give up the blog thing.

Print This Post Print This Post
Filed under: Development No Comments

The Amazon Appstore

Posted on March 24, 2011

So a few days ago Amazon opened their own Android Appstore, a direct competitor to Google's Android Market. There are other "app stores" out there, for instance, AppBrain, which has provided a much nicer web experience than the official Market until about two months ago, and SlideME, which is offering more payment method coverage, geographically.

So with Amazon entering the fray, the question is how successful they will be. This isn't just another application distribution channel, run by a small start-up hoping to make their mark. This is Amazon, after all, a company that's streamlined an online market for books and expanded it to cover just about anything these days. Not to mention their cloud services such as EC2 and S3.

For one, Amazon came in with quite a bang, an exclusive release of Angry Bird Rio, the third release in the Angry Birds game series that many "mobile gamers" are so gung-ho about. Not to mention, the Angry Birds Rio release was free, at least for us consumers. I'm speculating that Amazon is absorbing most if not all of the purchase price for each distribution and treats every download as a conversion.

Second, although the game itself is free, one is required to set up payment information (e.g. entering credit card details to be kept on file) before the download is granted. This really is the magic key, as it sets up users to be able to make impulse purchases. After all, it was Amazon who "invented" (or at least patented) the 1-click purchase.

Going forward, it seems that Amazon is offering a "free app of the day" in order to attract new accounts.

But will it work? Are consumers aware that their actual purchases are made against Amazon's Appstore, and that this fosters a dependency on this Appstore instead of the more official (Google) Market? I, for one, think I'll continue to look at the free apps that are being offered, but ultimately, if I wanted to buy an app, I'd want to purchase it through the "more official" Market. Losing access to (in my eyes) a bunch of freebies I've acquired on the Amazon Appstore would be preferable to having to worry about where I bought something and how to get it all back after upgrading to another phone, for instance.

And ultimately, should Amazon decide to enter the Android tablet market, things may become even more interesting. A color version of the Kindle, similar to the Nook Color could run Android, but not include the Google licensed apps (which includes the Market), and in that case, Amazon's devices would solely rely on their own Appstore.

The danger then is the fragmented Android market, and potentially annoying consumers that are trying to get their previously (Google Market) purchased Android phone applications working on their Amazon tablet only running the Appstore.

Purely speculation, of course. But an interesting scenario nonetheless, and not too far fetched as far as I'm concerned.

Print This Post Print This Post

Finding Duplicate or Similar Images with Perceptual Hashing in PHP

Posted on November 22, 2010

A while ago on Reddit someone asked how Tineye works. It's pretty fascinating; you upload a photo (or point it at a URL of an image) and it'll find other locations with similar images — if they've been indexed. Even if those images are in different sizes, or have had minor changes made to them, be it due to compression or because someone added or removed some text. So in a way, it's a fuzzy image based search engine.

Although I'm sure tineye has it's own set of algorithms and custom applications to drive all this, something similar, if crude in implementation, can be achieved with available software.

At the center of all this is an image hashing algorithm. Usually (cryptographic) hashes are designed to detect even the slightest modifications and return a completely different hash. We're looking for the opposite, and libphash delivers:

The phash library implements a "perceptual hashing" algorithm. From their site:

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar.

So the idea is that you feed it an image, and it'll return a hash, and the similar two hashes are to one another, the more identical the images are. The comparison of similarity is done by calculating the hamming distance, which in a way is the bit-level version of the levenstein() function that PHP developers may already be familiar with.

Even better, the phash library comes with PHP bindings, and provides a few functions to get you started. For instance, there's the ph_image_hash() function. Simply give it a filename of an image, and it'll return, well, a resource. Now this really puts a damper on the usefulness of that function, since resources are fairly opaque and hard if not impossible to work with.

Fear not, I've made a few changes so that ph_image_hash() returns a plain string with the hexadecimal representation of the hash, which can then be stored in a database, for instance. You can grab phash from my github repository.

Alright, you've got a way to get to those hashes, now how does one index them and look them up in a speedy way? Well, this is where it gets interesting, and unfortunately a little bit theoretical.

Ideally, you'll store these hashes (and other meta data) in a database, but not a SQL database. You really want a vantage point tree, or better, multiple vantage point tree. Essentially these are binary trees that build up based on the distance of hashes. The idea is that hashes that are similar are close together. So you traverse the tree in order to get close to your match, and then just "look around" in that area and you'll likely find similar results.

The MVP tree area gets pretty academic, and from what I've looked at so far, most of it is theoretical, presented as limited in applicability, but at the same time seems to be exactly what such a fuzzy image search engine would need to be based on. I'm fairly certain companies working on various aspects of image recognition and augmented reality and what not are all messing with this sort of thing, so there's likely very little incentive for them to publicize or advertise their algorithms.

The phash library does include a "MVPTree" library with basic examples of how data is stored. Having something like this built out into a scalable data store with an HTTP interface a la SOLR would be fantastic.

I'd immediately work on a PHP application to index my photos, detect duplicates, etc.

Print This Post Print This Post
Filed under: PHP 2 Comments