http://xkcd.com/936/His argument would be sound - if passwords were unlimited length.
UNIX used to be 8 characters, LINUX probably has extended that. Windows NT, and, of course, every Windows using the NT password core, uses 14 bytes maximum. using only lower case, you end up with _two_ common words, if you are lucky. Similarly, DSS used 56 bytes. Most encryption algorithms are in the 256 bit (32 character) range now for "good" encryption (that is, the NSA has to spend more than a minute to crack it).
Furthermore, the standard method of encrypting a document means that any success in matching is almost always recognized, as the characters for that encryption index suddenly pop into sensibility. Thus, once a document encoded with a 4-plain-text-word password shows a few characters, the computer can instantly switch to a dictionary attack on those characters, and have the whole word in a few seconds.
The old reference for this stuff used to be
Applied Cryptography, but, being from 1996, that probably doesn't have second-level designs like elliptical encrypting and such. An overview of the current state of the art can be learned from the textbook
Cryptography Engineering.
What does he mean by "bits of entropy"? Well, let's explain that a little. Let's say your common word vocabulary is about 1000 words (most people use 1000 words for 99% of their communications). 1024 is 10 bits. But, the 1000 words are typically from an over-set of 2000 words almost everybody uses commonly, thus 11 bits. Since the 4 words are _random_ they have no logical connection, thus you can add the entropy bits.
Now, a character, randomly chosen, that you can type easily, is actually 6 bits, not 8. If you used 8 random characters, you would get 48 bits of entropy. But taking a single uncommon dictionary word (65536 is 16 bits) and applying 3 modes (leet, U/L) lets you multiply by 1.5, so you end up with 24 bits of entropy. Adding 2 characters at the end would add 6 bits for each, so, 36 bits, perhaps somewhat less.