PHP SimpleXML is broken: Part 1

Jan 02, 2016 18:24


The following code does not work as expected:

$xml = simplexml_load_string("text"); $node = $xml->xpath("/root/a")[0]; if ($node) process($node); // process($node) may not be called for some valid nodes

Unlike most other objects, SimpleXmlElement may evaluate as false even when it is not null. Specifically, SimpleXmlElements representing empty ( Read more... )

hacker's diary, software development, web development

Leave a comment

Comments 23

dennisgorelik January 3 2016, 00:43:10 UTC
> so whatever goes inside the “if” will be skipped for them.

What does "goes inside the “if”" mean?
In your example "$node" is inside of "if", right?

Do you mean that if XPath found empty element, then resulting $node would be null?

Or you mean that if input XML contains empty node, then XPath search result would always evaluate to null even if it found not empty node?

Reply

yatur January 3 2016, 00:48:37 UTC
Everything that may be misunderstood will be misunderstood.
I fixed the example, so it is now (hopefully) clearer.

Reply

dennisgorelik January 3 2016, 00:57:54 UTC
> Everything that may be misunderstood will be misunderstood.

Of course.

> I fixed the example

I do not see what you fixed.

If your xpath was "/root/a" instead of "/root/b" it would be more clear I guess.
But your current xpath example is searching for node that is not empty.

Reply

yatur January 3 2016, 01:05:52 UTC
> I do not see what you fixed.

The 'goes inside if' part.

> But your current xpath example is searching for node that is not empty.

Well, that's the bug I found. The node is not empty, but PHP thinks it is. But I agree, it's confusing in this setting. Changed query to /root/a.

Reply


Why PHP? dennisgorelik January 3 2016, 00:47:43 UTC
> You should never have ventured in that cruel PHP land.

So why do you program in PHP now?

Reply

Re: Why PHP? yatur January 3 2016, 00:52:06 UTC
I use PHP for small projects on my web site because

- It integrates well with Apache, which is my web server
- It is used by my blog engine, WordPress
- It's free
- It's a different language that gives you a different perspective
- I did not have serious issues with it before

Reply

Re: Why PHP? dennisgorelik January 3 2016, 01:00:59 UTC
> - It integrates well with Apache, which is my web server

Why do you use Apache as your web server?
Why not keep using IIS?

> - It is used by my blog engine, WordPress

That's a good reason.
But why program your blog?
You can just use it as is, right?

> - It's free

Not really.
Your research & development time is at least an order of magnitude more valuable than hosting price.

> - It's a different language that gives you a different perspective

That's a very important reason, I agree.

> - I did not have serious issues with it before

So what do you plan to do about it after you discovered serious issues?

Reply

Re: Why PHP? yatur January 3 2016, 01:08:58 UTC
> Why do you use Apache as your web server?
> Why not keep using IIS?

Because I run my web server at home, and for IIS to work right and not give stupid errors on too many connections, I would have to buy a retail Windows Server.

> more valuable than hosting price.

A what price? I don't host it :) I play with it at home on a machine that I got sort of for free long time ago. I do crazy stuff with it I would never be able to do on a shared hosting, and dedicated hosting is too expensive. Research and development time is part of fun.

> So what do you plan to do about it after you discovered serious issues?

I don't know yet.

Reply


dennisgorelik January 3 2016, 01:19:42 UTC
> If you want to check for null, you’d better do it explicitly: if (something !== null).

So problem solved then?

Just do not use questionable "abbreviated if evaluation" form.

Reply

yatur January 3 2016, 03:22:37 UTC
> So problem solved then?

Sort of. It's still a stupid bug, so my trust in the whole SimpleXML thing is destroyed.

Reply

dennisgorelik January 3 2016, 05:17:15 UTC
How did you manage to accumulate that trust in the first place?

PHP is famous for being buggy.

Reply

yatur January 3 2016, 05:37:56 UTC
> How did you manage to accumulate that trust in the first place?

Whatever I did just worked. If it didn't, it was not PHP's fault.

> PHP is famous for being buggy.

I must have missed that fame. Do you have source(s)?

Reply


spamsink January 3 2016, 02:16:39 UTC
Isn't the construct just syntactic sugar for ? If so, the tag body is an empty string, and, consequently, it "sort of makes sense".

Reply

yatur January 3 2016, 03:09:32 UTC
> it "sort of makes sense"

Hardly. The node represents the whole tag, not the tag body.

I can see how null value evaluates to false. The farther the object gets from null, the shakier is the ground to evaluate it to false. The numbers 0 and 0.0 evaluating to false is alright, because it's C legacy and we are used to it. I can sort of see empty arrays and empty strings evaluating to false, but it's already a stretch. String "0" evaluating to false is total nonsense.

With 'empty tags' we are not even talking about a built-in type. If I take a different class representing XML tags, its behavior would be different (which is indeed the case with DOMElement).

Besides, tags with attributes but no body evaluate to true. Also do tags like 0 DOMElements representing empty tags do not evaluate to false. On top of it, the implementation is buggy like hell. Overall, this makes it a very shaky and surprising case.

They would have been better of leaving it alone and let the programmer check for the empty body if he wants.

Reply

spamsink January 3 2016, 03:38:24 UTC
As far as I'm concerned, the object corresponding to an empty tag is akin to an empty associative array, so it would not be a big surprise to me if it evaluates to false.

Reply

yatur January 3 2016, 04:06:56 UTC
Well, I doubt they defined Boolean coercion for XML elements just so it be annoying and surprising. There must have been enough people out there that thought it was a good idea, at least initially.

Eventually they did realize it is surprising, at least to some programmers. E.g. read the warning here: http://php.net/manual/en/function.simplexml-load-string.php.

You have to draw the line somewhere. Let's say an empty array evaluates to false. How about array only containing empty arrays? Or a tag only containing empty tags? You want to keep it simple, otherwise consistent checking for false becomes very expensive.

They seem to be moving away from the idea that any empty container should evaluate to false: e.g. an empty object no longer evaluates to false in PHP >4.

Reply


Leave a comment

Up