But I don't begin any of my sentences that way. And I don't think you actually converse f2f with anyone who begins their sentences that way. So I can only conclude that your sadness is at least as much about the assumptions made by Google n-grams sampling as about anything else.
Oh, of course not -- that is one of the reasons that it is meaningless :-).
It isn't so much making assumptions, though, as being un-picky -- the n-grams sample exactly what they claim to, i.e., everything. Based on this, it seems... more plausible than not... that out of all English sentences ever produced[1], more of them really are "All rights reserved" than anything else.
[1] Not that "produced" here is particularly well defined either. Basically, it's a factoid. But such a *striking* one.
Re: public web pages != the English languageelsmiFebruary 23 2010, 23:29:04 UTC
Yeah, that's one reason why I had that "more plausible than not" hedge.
The web is a substantial fraction of all English sentences ever produced (a lot of those pages are computer-produced, and consider that computers can produce sentences a lot faster than humans otherwise do), and humans just don't produce the same sentences over and over (unless you count things like backchannel utterances -- "uh huh", "yeah", which perhaps you should). That's really a lot of what's going on, and what makes it both plausible and meaningless.
Comments 4
Reply
It isn't so much making assumptions, though, as being un-picky -- the n-grams sample exactly what they claim to, i.e., everything. Based on this, it seems... more plausible than not... that out of all English sentences ever produced[1], more of them really are "All rights reserved" than anything else.
[1] Not that "produced" here is particularly well defined either. Basically, it's a factoid. But such a *striking* one.
Reply
That's rather different from "all English sentences ever produced".
Reply
The web is a substantial fraction of all English sentences ever produced (a lot of those pages are computer-produced, and consider that computers can produce sentences a lot faster than humans otherwise do), and humans just don't produce the same sentences over and over (unless you count things like backchannel utterances -- "uh huh", "yeah", which perhaps you should). That's really a lot of what's going on, and what makes it both plausible and meaningless.
Reply
Leave a comment