Comments | pyesetz: Work post

pyesetz

Work post

Feb 27, 2008 13:49

( Read more... )

𝔾, computers, РНР, free-software

Leave a comment

Back to all threads

giza February 27 2008, 22:42:25 UTC

My understanding of XML is that it was intended more as a data interchange format, and not something to store data in full time. With a syntax as complex as XML, that's a lot parsing involved.

As it stands with your rewrite, there's still PHP code that is going to be parsed on every page load. Why not use seralize() to store your data in a format which needs even less parsing? If there are concerns about human-readability of the data, you could always check the timestamp of the PHP source and only re-generate the serialized data if the serialized data value is missing/non-existent.

I wrote a cache system awhile back that stored large pieces of data (more than 100K in size) in files as serialized strings. PHP loaded and ran unserialize() on them amazingly fast, in less than a second. It was nice.

Alternatively, why not just put everything into a database? :-) Since you mention formerly not knowing anything about MySQL, I'm assuming that you have some knowledge of it now.

pyesetz February 27 2008, 23:52:08 UTC

Indeed, PHP arrays are less portable than XML because they are only quick to load if you're using PHP! A 150KB database takes about 30 milliseconds to parse, which seems fast enough. For the larger databases, I use a C program to do search-queries on the database file and output the "reduced" database with only matching entries, which can then be loaded into PHP very quickly.

I dislike the "serialize" format because it contains counted lengths, so editing of databases using Emacs is a pain. Loading 150KB from a serialized file takes only 10 msecs, but the whole point of XML is that readability is more valuable than CPU time (within limits, which DOMXML exceeds). So I'm basically using PHP arrays to achieve the goals that the XML people set for themselves. And it's portable to any machine that runs PHP, which is all the interesting ones!

I've managed to pick up a little SQL by reading the code written by the other consultants at Company Γ, but I've never even done an INNER JOIN. I generally avoid using MySQL because

It insists ( ... )

giza February 28 2008, 01:20:22 UTC

Maybe I didn't make it clear in my original comment, but the serialized files would not be intended to be edited by hand. You'd still edit the PHP files by hand, but they would only by read by your code on the first pass, at which point serialized copies of the data would be written and then read on future passes. The code would look something like this:

if (!cache_file()) {
$data = read_php_file();
write_cache_file($data);
} else {
$data = read_cache_file();
}In this example, cache files == the serialized data. That is essentially the same algorithm that I implemented on a prokect awhile back so I could avoid some particularly expensive database queries ( ... )

Back to all threads