Short URLs with Zend Framework
First up, what's a short URL? A short URL is just that; a url that is as short as it can possibly be, so that takes up as few characters as possible when it is used in a twitter message, which itself is limited to 140 characters and probably the main reason short URLs are so popular. Each character counts.
Technically, short URLs consist of a short domain name and a simple identifier, usually the numeric primary key in a database table of whatever item the page is supposed to be for. And to make that number even shorter it's typically base 62 encoded.
The digits are represented using the numbers 0-9, lowercase a-z and uppercase A-Z. And although PHP offers a base_convert() function, it's unfortunately useless as it only supports up to be base 36 and loses precision on large numbers (it uses floating point math internally). So a replacement is needed.
There are all kinds of base62 encoding and decoding functions out there already. One is bc_base_convert, which uses (requires) the bcmath extension. Another one that's a bit more fleshed out and cleaner looking that I found on pastie while browsing reddit. I've reproduced it here for easy reference:
/**
* @class Integer
* @author Julien Garand (Go On Web)
*
* Can encode and decode integers to/from a string, using a custom alphabet
*/
class Integer
{
// Default alphabet for a "normal" base 62 encoding
static protected $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
static protected $base = 62;
/**
* Define your custom alphabet here
*/
static public function setAlphabet( $alphabet )
{
// only strings are allowed
if ( !is_string($alphabet) )
{
throw new Exception('Given alphabet is not a string !');
}
self::$base = strlen( $alphabet ); // Our base will be the length of the given alphabet
// We check if alphabet doesn't have doubled characters
if ( strlen( count_chars( $alphabet, 3 ) ) != self::$base )
{
throw new Exception('The following alphabet has doubled characters : '.$alphabet);
}
self::$alphabet = $alphabet; // store it
}
/**
* Basic accessors
*/
static public function getAlphabet() { return self::$alphabet; }
static public function getBase() { return self::$base; }
/**
* Encode an integer according to the defined alphabet
*
* @param integer : Unsigned integer to be encoded
* @return string (or false if failed)
*/
static public function encode( $integer )
{
$integer = (int)$integer; // Be sure to have an integer
// We only accept unsigned integers
if ( $integer < 0 )
{
return false; // or throw new Exception( "($integer) is less than 0 and cannot be converted" );
}
$string = ''; // our encoded integer
// while we have to encode
while( $integer )
{
$pos = $integer % self::$base; // get the rest of euclidian division...
$string .= self::$alphabet[ $pos ]; // thats the position of the char in alphabet
$integer = ( $integer - $pos ) / self::$base; // and divide integer (minus just encoded char) by the base
}
return strrev( $string ); // As we started by the unit of our base ( $base ^ 0 ), we have to reverse the string
}
/**
* Decode a string to an integer according to the defined alphabet
*
* @param string : String to be decoded
* @return integer (or false if failed)
*/
static public function decode( $string )
{
$string = (string)$string; // be sure to have a string;
// check if our string only have chars that are in the alphabet
if ( strcspn( $string, self::$alphabet ) )
{
return false; // or throw new Exception( "($string) is not a string or contains characters that are not in alphabet" );
}
$integer = 0; // our integer to find
$unit = 1; // we start by $base^0
// foreach chars, starting at the end
for( $i = strlen( $string ) -1; $i >= 0; $i -- )
{
$pos = strpos( self::$alphabet, $string[$i] ); // we find it's position in alphabet
$integer += $pos * $unit; // its our number to add, multiplied by the current unit
$unit = $unit * self::$base; // and go to next unit in our base
}
return $integer;
}
}
So now we can convert our simple numbers into the more cryptic looking short url identifiers simply using Integer::encode(). Here's some example conversions:
1 => 1 10 => a 100 => 1C 255 => 47 1000 => g8 10000 => 2Bi 65535 => h31 100000 => q0U 1000000 => 4c92
Instead of http://example.com/1000000, you could end up with http://example.com/4c92. There, three characters saved. That makes a difference, particularly with really short domain names, such as Twitter's own URL shortener: http://t.co.
Working with Routes
So, in Zend Framework the actual page logic starts in controllers and actions, which are essentially classes and methods, respectively. In order to to reach a controller and action, request URLs are routed using the router.
The default route is sufficient for most applications. It conveniently maps the first two path segments to controller and action. So http://example.com/photo/view maps to the PhotoController::viewAction(). Also, the action is optional, and if omitted will default to index. Therefore, http://example.com/photo will map to PhotoController::indexAction(). There are other scenarios that are helpful to be familiar with.
Now the easiest way to support short URLs is to add a route that will match any alphanumeric characters and route that to the desired destination. That could look something like this:
$router = Zend_Controller_Front::getInstance()->getRouter();
$router->addRoute('photo', new Zend_Controller_Router_Route(':shortid', array(
'controller' => 'photo',
'action' => 'view'
), array(
'shortid' => '[0-9a-zA-Z]+'
)));
The side effect, however, is that there's no longer a distinction between a short URL such as "http://example.com/1hF" and "http://example.com/photo". Since "photo" could be a base62 encoded number (in fact, it would be the number 373,554,054). This can be worked around if you make all other URLs specify both the controller and action explicitly, so you'd use "http://example.com/photo/index" to ensure that the short URL route doesn't match.
Then, in your controller's action, you'd handle the request using the short URL:
class PhotoController extends Zend_Controller_Action
{
public function viewAction()
{
if ($shortId = $this->_getParam('shortid')) {
$id = Integer::decode($shortId);
}
// rest of the logic to view the photo here, using $id.
}
}
This technique may not always apply, however, since you might already have a larger application that has all kinds of links that you can't just change to make this short URL thing work.
Another technique is to modify short URL a bit so they're more easily recognizable as such. I did that for one application by sacrificing one extra character. I just prefixed all the short IDs with an upper case "S". So you'd have a URL such as http://example.com/S4c92. This works since normally the URLs are all lower-case anyway:
$routes['twitter-pics'] = new Zend_Controller_Router_Route_Regex(
'(?-i)S([\w\d]+)',
array('controller' => 'photos',
'action' => 'view'),
array('shortid' => 1),
'/%s'
);
Note that this is a regular expression based route. The (?-i) turns off case insensitivity. I still wasn't happy with this approach, because the action still needs to explicitly handle that 'shortid' variable.
Using a Custom Route
I wanted everything encapsulated in the route, so I wrote a custom route class.
The interface that Zend provides is rather straight forward:
interface Zend_Controller_Router_Route_Interface {
public function match($path);
public function assemble($data = array(), $reset = false, $encode = false);
public static function getInstance(Zend_Config $config);
}
matchchecks whether the route matches the path of the requestassembleis used to build a URL based on the parametersgetInstanceis supposed to accept a configuration and return a new instance of the route. I don't even care about that at the moment.
Here's the finished class:
/**
* Short Route
*
* Provides short URLs
*
* @author Marcus Welz
*
*/
class Td_Controller_Router_ShortRoute implements Zend_Controller_Router_Route_Interface
{
/**
* @var string The URL prefix
*/
protected $_urlPrefix = 'S';
/**
* @var array The parameter as passed to the request
*/
protected $_params = array();
/**
*
* @param string $urlPrefix The prefix of the URL
* @param array $params The parameters as passed to the request
*/
public function __construct($urlPrefix, $params = array())
{
$this->_urlPrefix = $urlPrefix;
$this->_params = $params;
}
/**
* @param string $path The URL such as "/P3"
* @return array|false returns parameters including the id on success, false if no match
*/
public function match($path)
{
$prefix = preg_quote($this->_urlPrefix);
if (preg_match('/\/' . $prefix . '([A-z0-9]+)$/', $path, $matches)) {
$params = $this->_params;
$params['id'] = Integer::decode($matches[1]);
return $params;
}
return false;
}
/**
* Assemble a URL using the ID
*
*
* @param array $data 'id' is the only used parameter in the array
* @param bool $reset unused / ignored
* @param bool $encode unused / ignored
*/
public function assemble($data = array(), $reset = false, $encode = false)
{
return $this->_urlPrefix . Integer::encode($data['id']);
}
public static function getInstance(Zend_Config $config)
{
throw new Exception('not implemented');
}
}
Using it is straight forward. First, add it to the router:
Zend_Controller_Front::getInstance->getRouter()
->addRoute('photo', new Td_Controller_Router_ShortRoute('S', array(
'controller' => 'photo',
'action' => 'view'
)));
Since the conversion between base 62 and base 10 is happening inside the class, the action doesn't have to decode it itself and is thus blissfully unaware of it. Encapsulation successful. And to generate a URL in a view, you'd use the url() view helper:
$this->url(array('id'=> $photo['id']), 'photo')
Good enough for me.
Print This Post