marcus welz

Zend Framework and File Locking Pitfalls

Posted on August 3, 2008

Earlier today while reading through the Zend Framework 1.6 RC1 release notes I've come across an interesting bug that has been fixed: [ZF-3382] Zend_Cache_Backend_File problems under very high load.

There are a lot of things to say about this issue. The obvious ones first:
1. Under typical operation such as opening, locking, reading/writing, and then closing a file you should never need to unlock a file explicitly since fclose() implicitly unlocks a file. Calling flock(LOCK_UN) explicitly merely introduces a potential race condition between the lock release and file close. And it'll be (very) hard to debug.
2. PHP's fopen() has a bug when using "wb" (overwrite in binary mode). Instead, "ab+" (append in binary mode) followed by a fseek(0) and truncate(0) eliminates another potential race problem. Roof Top Solutions has an article on this.

Honestly, these types of bugs are of the worst kind particularly because they're rather hard to identify. The buggy code runs and seems fine, but will fail when you need it the most; under high loads, such as when you're on digg, slashdot, or any other large site that sends massive amounts of traffic your way. Suddenly, instead of helping your site scale, it causes your web application to thrash while the cache files are getting clobbered and trampled on.

Now imagine this is an issue with your in-house component that you have to solve by yourself. First it would have to be isolated and identified because the big picture is just that your application crumbles under high load — "even with all the caching that it's doing. Not sure what's going on", might be the first thought. I know that it would take me quite some time to actually narrow it down to the caching layer, create stress tests, experiment with various backends to see if, say, using memcache vs. files would fix the issue. And that's just identifying the problem. When it comes to fixing it, not explicitly unlocking isn't too hard to figure out, but the issue with fopen() in "wb" mode? That would have taken a while. Case in point; looking at the issue ticket, it was created June 4 and resolved on June 26, and judging by the notes, largely due to the efforts of Cody Pisto, who spent his afternoon on June 25 identifying the problems and creating a patch for this tricky issue.

Furthermore, this is a great example of the benefits gained from the Zend Framework (and other open source frameworks and components). In buzzwordy marketing lingo: It's a time and battle tested feature rich platform of loosely coupled components that you can mix and match as you please, and it only gets better as its adoption rate increases.

Print This Post Print This Post

Handling CSV data with PHP the Smart Way

Posted on June 14, 2005

Let's assume we have a file "users.csv", containing the following data:

"Username","Firstname","Lastname","Age"
"johndoe","John","Doe","21"
"hmiller","Hank","Miller","35"

Processing the data in PHP is rather straight forward, by simply using fgetcsv(). Let's display the user's name and age:

// open the file for reading
$f = fopen('users.csv', 'r');

// just a dummy read to get the header out of the way
fgetcsv($f);

// loop through each line
while ($data = fgetcsv($f))
{
	// and write the line "{Firstname} {Lastname} is {age} years old"
	echo($data[1] . ' ' . $data[2] . ' is ' . $data[3] . " years old.\n");
}

The problem that we have here is the "magic numbers". $data[1], $data[2], and $data[3]. Anytime there are magic numbers (Numbers which aren't immediately obvious) they should be replaced with constants or other logic to make their purpose more apparent. For example, if you're working with geometry, rather than using 3.14159 one should use the constant M_PI.

Another reason is that the columns could be re-arranged, so all the sudden we're now looking at users.csv in the following file format:

"Username","Password","Firstname","Lastname","Age"
"johndoe","sekret","John","Doe","21"
"hmiller","trustno1","Hank","Miller","35"

If we ran our script, $data[1] now refers to the column containing the password. We can easily fix this by making a few minor changes to our script:

// open the file for reading
$f = fopen('users.csv', 'r');

// load the row containing the column headers
$hdr = fgetcsv($f);

// flip key -> value (rather than 1 => 'Password', we have 'Password' => 1)
$hdr = array_flip($hdr);

// loop through each line
while ($data = fgetcsv($f))
{
	// and write the line "{Firstname} {Lastname} is {age} years old"
	echo($data[$hdr['Firstname']] . ' ' . $data[$hdr['Lastname']] . ' is ' . $data[$hdr['Age']] . " years old.\n");
}

By simply using $hdr as lookup array we can access each row by its column name rather than some magic number. But be careful: The column header is case-sensitive, and it is possible that two columns contain the same header, in which case array_flip() will use the latter column.

But of course we can detect that if we wanted to:

// open the file for reading
$f = fopen('users.csv', 'r');

// load the row containing the column headers
$hdr = fgetcsv($f);

// flip-flip achieves the same
// as array_unique() but at much better performance
$hdr_unique = array_flip(array_flip($hdr));
$dupes = array_diff(array_keys($hdr), array_keys($hdr_unique));
if (count($dupes))
{
	echo("The following columns have duplicate headers:\n");
	foreach (array_flip(array_flip($dupes)) as $dupe)
		echo($dupe . ' - ' . $hdr[$dupe] . "\n");
	exit();
}

// flip key -> value (rather than 1 => 'Password', we have 'Password' => 1)
$hdr = array_flip($hdr);

// loop through each line
while ($data = fgetcsv($f))
{
	// and write the line "{Firstname} {Lastname} is {age} years old"
	echo($data[$hdr['Firstname']] . ' ' . $data[$hdr['Lastname']] . ' is ' . $data[$hdr['Age']] . " years old.\n");
}

That's all folks!

Print This Post Print This Post