Paginating Zend_Search_Lucene results
This short entry was inspired by a snippet of inefficient code I encountered, which involved iterating over an array with a loop and breaking out of it once enough results were fetched.
Zend_Search_Lucene does not paginate results. It simply returns an array. While it does allow you to specify to only return the first N results (using Zend_Search_Lucene::setResultSetLimit($limit)), this is not really all too useful.
$lucene = Zend_Search_Lucene::open('index');
$hits = $lucene->find('author:"mark twain"');
$page = 1;
$perpage = 10;
return array_slice($hits, $page * $perpage - $perpage, $perpage);
The key element here, of course, is the use of array_slice(), which can be used with any array. So this isn't specific to Zend_Search_Lucene in any way.
Print This Post
Handling CSV data with PHP the Smart Way
Let's assume we have a file "users.csv", containing the following data:
"Username","Firstname","Lastname","Age"
"johndoe","John","Doe","21"
"hmiller","Hank","Miller","35"
Processing the data in PHP is rather straight forward, by simply using fgetcsv(). Let's display the user's name and age:
// open the file for reading
$f = fopen('users.csv', 'r');
// just a dummy read to get the header out of the way
fgetcsv($f);
// loop through each line
while ($data = fgetcsv($f))
{
// and write the line "{Firstname} {Lastname} is {age} years old"
echo($data[1] . ' ' . $data[2] . ' is ' . $data[3] . " years old.\n");
}
The problem that we have here is the "magic numbers". $data[1], $data[2], and $data[3]. Anytime there are magic numbers (Numbers which aren't immediately obvious) they should be replaced with constants or other logic to make their purpose more apparent. For example, if you're working with geometry, rather than using 3.14159 one should use the constant M_PI.
Another reason is that the columns could be re-arranged, so all the sudden we're now looking at users.csv in the following file format:
"Username","Password","Firstname","Lastname","Age"
"johndoe","sekret","John","Doe","21"
"hmiller","trustno1","Hank","Miller","35"
If we ran our script, $data[1] now refers to the column containing the password. We can easily fix this by making a few minor changes to our script:
// open the file for reading
$f = fopen('users.csv', 'r');
// load the row containing the column headers
$hdr = fgetcsv($f);
// flip key -> value (rather than 1 => 'Password', we have 'Password' => 1)
$hdr = array_flip($hdr);
// loop through each line
while ($data = fgetcsv($f))
{
// and write the line "{Firstname} {Lastname} is {age} years old"
echo($data[$hdr['Firstname']] . ' ' . $data[$hdr['Lastname']] . ' is ' . $data[$hdr['Age']] . " years old.\n");
}
By simply using $hdr as lookup array we can access each row by its column name rather than some magic number. But be careful: The column header is case-sensitive, and it is possible that two columns contain the same header, in which case array_flip() will use the latter column.
But of course we can detect that if we wanted to:
// open the file for reading
$f = fopen('users.csv', 'r');
// load the row containing the column headers
$hdr = fgetcsv($f);
// flip-flip achieves the same
// as array_unique() but at much better performance
$hdr_unique = array_flip(array_flip($hdr));
$dupes = array_diff(array_keys($hdr), array_keys($hdr_unique));
if (count($dupes))
{
echo("The following columns have duplicate headers:\n");
foreach (array_flip(array_flip($dupes)) as $dupe)
echo($dupe . ' - ' . $hdr[$dupe] . "\n");
exit();
}
// flip key -> value (rather than 1 => 'Password', we have 'Password' => 1)
$hdr = array_flip($hdr);
// loop through each line
while ($data = fgetcsv($f))
{
// and write the line "{Firstname} {Lastname} is {age} years old"
echo($data[$hdr['Firstname']] . ' ' . $data[$hdr['Lastname']] . ' is ' . $data[$hdr['Age']] . " years old.\n");
}
That's all folks!
Print This Post
Creating bitmap fonts with PHP
The GD library enables PHP the to load custom fonts (with imageloadfont()) and add text to images. But it can also be leveraged to create interesting effects — particularly when one uses PHP to first create a custom font.
In order to be able to create a custom font we need to find out more details about its format but fortunately we don't have to look far as the PHP manual page on imageloadfont() explains the layout in "Table 1". The information that is contained in the font file is the number of characters in the font, the ASCII value of the first character in the font, pixel width of each character, pixel height of each character, followed by the data describing the actual characters, one byte per fixel per character. I recommend you look at the table which breaks down the information nicely. The GD library web site describes the actual C data structure.
The next step is to generate the font file containing data according to these specifications. A relatively easy way to do this is using PHP's pack() function.
So let's get started. Here's a short PHP script that will create the header, characters and write all data into the "myfont.fnt" file. (If you're trying this for yourself on a Linux server, remember to check file permissions, so that PHP is able to actually write the file.)
$char_head =
'01000000'. // only one char
'20000000'. // space (0x20) is first char
'08000000'. // 8 pixels wide
'08000000'; // 8 pixels high$char_data =
'FFFF0000000000FF'.
'0000FF000000FF00'.
'0000FF000000FF00'.
'0000F0000000FF00'.
'0000FF000000FF00'.
'0000FF000000FF00'.
'0000FF000000FF00'.
'000000FFFFFF0000';
$fontdata = pack('H*', $char_head . $char_data);
// write the font data to file
$file = fopen('myfont.fnt', 'w');
fwrite($file, $fontdata);
fclose($file);
Now we just need to put the new font to use. The following script simply loads the font, creates an image 24 x 8 pixels small, writes the "space" character 4 times in blue on gray background, and sends it to the web browser.
// load the font
$myfont = imageLoadFont('myfont.fnt');// create the image canvas (4 characters wide)
$image = imageCreate( 8*4, 8 );
// allocate the colors (first one is background)
$white = ImageColorAllocate($image, hexdec('cc'), hexdec('cc'), hexdec('cc'));
$blue = ImageColorAllocate($image, hexdec('33'), hexdec('66'), hexdec('99'));
// write the string (four spaces)
imageString($image, $myfont, 0, 0, ' ', $blue);
// send image to browser
header('Content-type: image/png');
imagePNG($image);
However, there's always at least one caveat, and the bad news is right in the PHP manual:
The font file format is currently binary and architecture dependent. This means you should generate the font files on the same type of CPU as the machine you are running PHP on.
Thus, the issue at hand is whether to write the data with big, or little endian byte order. Currently, the code snippet above uses the little endian format and that works just fine on, say, my PC. But that may not be true for other computer architectures.
My plan for this article was to try and demonstrate how when the $char_head format is changed, the font no longer works. However, while doing this I ran across an interesting feature, which looks like a half-implemented endianess detection routine for cross-platform compatibility. For instance, GD gives an error when I convert just the first line into big endian. On the other hand, it works again when the width and height integers are adjusted as well. Note, however, that the second integer must remain in little endian format. This threw me off, and I had to investigate. Turns out that in the PHP code (specifically, ext/gd/gd.c in the definition of PHP_FUNCTION(imageloadfont)) there's a sanity check embedded. Essentially PHP compares the character count in the header with the actual filesize. If this doesn't match, it'll simply flip endians. Due to neglecting to flip endians of the offset (first character in font), this check does not guarantee architecture independency and technically should be filed as a bug on PHP's bug tracker.
However, In order to guarantee the requirement that the created font will function on the machine it was created on, we should make some modifications and use the following method instead:
$char_count = 1; // only one char
$char_first = ' '; // space is the first char
$char_width = 8;
$char_height = 8;$char_data =
'FFFF0000000000FF'.
'0000FF000000FF00'.
'0000FF000000FF00'.
'0000F0000000FF00'.
'0000FF000000FF00'.
'0000FF000000FF00'.
'0000FF000000FF00'.
'000000FFFFFF0000';
// convert to binary data
$fontdata = pack('llllH*',
$char_count,
ord($char_first),
$char_width,
$char_height,
$char_data);
// write the font data to file
$file = fopen('myfont.fnt', 'w');
fwrite($file, $fontdata);
fclose($file);
That's already it. The resulting image is this:
There are plenty of uses for this: creating dynamic icons on the fly, creating colorful patterns around text, creating maps for web-based games just off the top of my head.
Print This Post
PHP Vulnerabilities Announced
Just saw the announcement on Slashdot by the Hardened-PHP Project. The vulnerabilities include pack(), unpack(), safe_mode_exec_dir bypass in multithreaded PHP, realpath() and unserialize().
Print This Post