Android: Ducking Audio with TextToSpeech
The Android app I'm working on makes use of TextToSpeech. It's also very likely that the user is playing music in the background. In order for the music to not drown out the TTS output, we'll make use of audio ducking, a feature that's "new" as of API level 8 (Android 2.2).
The ingredients needed for this is an instance of the TextToSpeech class and the AudioManager.
In order to get spoken text, we use TextToSpeech.speak(String text, int mode, HashMap<String,String> params);
Now, in order to properly get any background music to duck when we start speaking, and raise the volume when we're done speaking, we'll need to know when the TTS is done blabbering. And for this, we can register a callback via TextToSpeech.setOnUtteranceCompletedListener(). There are, however, a few pitfalls that we'll need to watch out for:
- TextToSpeech.setOnUtteranceCompletedListener() MUST be invoked after onInit() was called. Otherwise it won't necessarily register properly, and onUtteranceCompleted() may never be called.
- When invoking TextToSpeech.speak(), we must provide an "Utterance ID" via the params HashMap using TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID. If there's no ID, onUtteranceCompleted() will not be invoked.
- Because it is possible to queue up more utterances via TextToSpeech.speak() before the previous one has completed, we'll need to track how many we've queued up, so we know to only release audio focus after all of them have been completed. Otherwise we've got a race condition where we queued up two things to speak, but lose focus after the first one has completed. The result is a mess where music will cut in and out while text is being spoken.
With this in mind, we'll create a wrapper class which will encapsulate this functionality.
public class DuckingTTS {
// Log tag
private final String TAG = DuckingTTS.class.getName();
// Debugging is on (set to false to cut down on spam)
private static final boolean D = false;
/**
* Which audio stream to use. We use the music one here,
* so that it'll be "in line" with the music volume from whatever is playing.
*/
private final int STREAM_TYPE = AudioManager.STREAM_MUSIC;
/**
* Parameters we're feeding with each invocation of speak()
*/
private HashMap ttsParams;
/**
* How many utterances are playing at a particular moment.
*/
private int mUtterancesPlaying = 0;
/**
* The text-to-speech engine
*/
private TextToSpeech mTts;
/**
* Whether TTS is initialized (onInit() called yet?)
*/
private boolean mIsInitialized = false;
/**
* Queue up chatter when TTS is not initialized yet
*/
private List queue = new ArrayList();
/**
* AudioManager injected via RoboGuice
*/
@Inject
private AudioManager mAm;
/**
* Context will be injected via RoboGuice
* @param context
*/
@Inject
public DuckingTTS(Context context) {
ttsParams = new HashMap();
ttsParams.put(TextToSpeech.Engine.KEY_PARAM_STREAM, String.valueOf(STREAM_TYPE));
ttsParams.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "ID");
mTts = new TextToSpeech(context, ttsOnInitListener);
mIsInitialized = false;
}
public boolean speak(String text) {
if (!mIsInitialized) {
if (D) Log.d(TAG, "speak(\"" + text + "\") - queued");
queue.add(text);
return false;
}
// Tell it to speak
if (D) Log.d(TAG, "speak(\"" + text + "\") - sending to TTS");
mTts.speak(text, TextToSpeech.QUEUE_ADD, ttsParams);
if (mAm == null) return true;
// if this is the first utterance (e.g. we're not already talking) then request audio focus w/ducking.
if (mUtterancesPlaying < 1) {
int status = mAm.requestAudioFocus(audioFocus, STREAM_TYPE, AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK);
if (status == AudioManager.AUDIOFOCUS_REQUEST_FAILED) {
Log.e(TAG, "speak() audio focus request failed.");
}
}
mUtterancesPlaying++;
return true;
}
public void shutdown() {
mTts.shutdown();
}
private TextToSpeech.OnInitListener ttsOnInitListener =
new TextToSpeech.OnInitListener() {
/**
* Callback for when the TextToSpeech engine was initialized.
* Result will tell us whether this was successful or not.
*
* @param status
*/
public void onInit(int status) {
if (status != TextToSpeech.SUCCESS) {
return; // we just abort on failure, it's never fully initialized
// this can be bad, by the way, because every speak() call will now add something to the queue.
}
mIsInitialized = true;
mTts.setOnUtteranceCompletedListener(ttsOnUtteranceCompletedListener);
// Also speak anything that was queued up so far.
for (String text : queue) {
speak(text);
}
}
};
private TextToSpeech.OnUtteranceCompletedListener ttsOnUtteranceCompletedListener =
new TextToSpeech.OnUtteranceCompletedListener() {
/**
* Callback when TTS has completed an utterance.
*/
public void onUtteranceCompleted(String utteranceId) {
if (D) Log.d(TAG, "onUtteranceCompleted(\"" + utteranceId + "\")");
mUtterancesPlaying--;
if (mAm == null) return;
// once we're done speaking, lose audio focus.
if (mUtterancesPlaying < 1) {
mUtterancesPlaying = 0;
mAm.abandonAudioFocus(audioFocus);
}
}
};
private AudioManager.OnAudioFocusChangeListener audioFocus =
new AudioManager.OnAudioFocusChangeListener() {
public void onAudioFocusChange(int focusChange) {
// I don't think we actually care.
if (D) Log.d(TAG, "onAudioFocusChange(" + focusChange + ")");
}
};
}
That's it. I use it in a service, and I wire it in with a simple annotation using RoboGuice:
public class GpsService extends RoboService {
@Inject
private DuckingTTS mDuckingTTS;
// ... rest of the class implementation ...
}
As the final disclaimer: The class works for me and my purposes at the moment, but it doesn't handle every error scenario. Also, don't forget to call shutdown() to release the TTS resources.
Print This Post
Android: Getting started with RoboGuice 2.0 (beta 3)
When I started messing around with Android, it consisted mostly of copying and pasting example code together to quickly get some results. That works, but the unfortunate side effect is that the Activity or Service class balloons out with functionality and features that are better off encapsulated according to proper object oriented concepts and best practices and what not.
However, once I started more time modeling classes I realized that there are an aweful lot of cases where you'll need to pass around contexts in order to get access to service providers. AudioManager for TextToSpeech, LocationManager for GPS, SensorManager for Accelerometer information, PowerManager for wake locks, just about anything worth doing required accessing a service provider. So as I started encapsulating functionality in classes I wasn't sure how to best go about initializing them. Do I keep passing around the context via the constructor, and provide setters and getters to inject mock services for unit testing? Do I use factories?
Luckily, I stumbled across RoboGuice, which extends Google's Guice dependency injection framework. Although the current "production ready" version of RoboGuice is at 1.1 (and uses Guice 2.0), RoboGuice version 2.0 uses Guice 3.0, and was simpler to set up — because it doesn't need a custom Application class. I'm all about simplicity (everything should be made as simple as possible, but not simpler).
Quick note here, I'm writing this from the point of retrofitting it to an already existing application. So I already have my application set up. I just want to take advantage of RoboGuice now to simplify it a bit.
Why?
Alright, so a good first question is, what the heck is RoboGuice, and why do I want to use it?
Essentially, RoboGuice is a dependency injection framework and allows for inversion of control. That's almost saying the same thing in two different ways. If that doesn't tell you anything, you should read up on those concepts. It would be silly for me to explain it here, since there are far better resources for that out there. In a nutshell, it helps streamline how objects are wired together by convention and configuration, allows for better separation of concerns, and, a very first and easy benefit to grasp, it reduces the amount of boiler plate code that needs to be written.
Take a look at A Simple Example that RoboGuice provides.
So right off the bat, their example shows that it's dead simple to wire up view object and system service providers, simply using annotations.
How?
As I said, I'm not even bothering with RoboGuice 1.1. Upgrading to RoboGuice 2.0 is explained on the RoboGuice Wiki. If you're completely new to it, however, it can be a bit overwhelming. To start from scratch, you need the following:
- Download the latest RoboGuice 2 snapshot. Currently, that's version 2.0b3. You'll want to drop this into your projects "/libs/" directory, which is where most other JAR files go as well if you use any (e.g. the fragments support backport android-support-v4.jar, or the Google Maps maps.jar, etc.)
- Download Guice 3.0. You want the guice-3.0-no_aop.jar. Again, this goes into the "/libs/" directory of your project.
- Not immediately obvious is that you'll also want to grab the guice-3.0.zip, because you need the
javax.inject.jarfrom it. Yes, also goes into the "/libs/" directory. - The JARs need to be added to your project, so in Eclipse, go to the Project menu, Properties, Java Build Path, Libraries tab, now "Add JARs", and add all three JARs (guice-3.0-no_aop.jar, javax.inject.jar, and roboguice-2.0b3.jar).
Okay, now your project has RoboGuice added to it, but nothing is using it yet.
Putting it to use
One of the first things you'll want to do is go into one of your application's existing Activities. If you're doing it the simple / old way your class probably just extends Activity. Just change it so it extends RoboActivity instead. If you're using fragments and your activity is a FragmentActivity class, just change it to be RoboFragmentActivity. If you're using any services, and have a class that extends Service, modify the class to extend RoboService instead.
Then go through your onCreate() methods, rip out the findViewById() calls, and replace them with @InjectView annotations in front of your property declarations, it's easy to just check A Simple Example for reference again.
Instead of a setContentView() call in onCreate(), you can use the @ContentView(R.layout.layoutname) annotation right before your class definition.
For example:
@ContentView(R.layout.record)
public class RecordActivity extends RoboFragmentActivity
{
@InjectView(R.id.txtDistance) TextView txtDistance;
@InjectView(R.id.txtTime) TextView txtTime;
@InjectView(R.id.txtPace) TextView txtPace;
@InjectView(R.id.btnStart) Button btnStart;
}
I hope that helps you get started quickly and painlessly.
Print This Post