[...] Additionally, if you have a paid account, a robots.txt file will be added to your personalized subdomain (http://exampleusername.livejournal.com/).
(
FAQ #50)
It would seem that this works for Free Accounts as well now, and a revision of the FAQ would thusly be in order. :)
(How does this work for communities?)
Comments 5
I don't know if it's deliberate or an oversight? A staffer did say on a comment that there didn't seem to be any reason they couldn't make robots.txt user-editable, in the future. I think that user previously had a paid account anyway though.
I guess the change needs to be documented somewhere though yeah.
Reply
I might misunderstand what you mean here, but if I did, so will others -- whether or not a paid account subdomain has a robots.txt file that blocks robots depends on whether or not the account owner configured the account that way. For instance, mine has no restrictions. No site functionality has changed, but users who used to have a www.livejournal.com URL (and thus had meta tags) now have a subdomain URL (and thus have a robots.txt file).
Any downloady-thing that respects robots.txt should also respect the equivalent meta tags; the only reason that there were both meta tags and robots.txt was that on the old www.livejournal.com/users/exampleusername URL, the number of users blocking robots would have made the sitewide robots.txt file too big. That user just needs to turn off the "block search engines" option while they download it, or tell their downloady-thing to ignore robots.txt.
The only documentation change necessary is what freso implied above, getting rid ( ... )
Reply
Thanks for clarifying how it works, mendel. I didn't properly understand how the paid account subdomain robots.txt file worked. I mistakenly thought that there was one on every paid subdomain and it couldn't be disabled, at all.
I always had the 'block robots' option enabled for my journal, via editinfo.bml. It never ocurred to me that it configured the robots.txt file as well as the META tags! Although I'd have kept it enabled anyway, I used to think there should be a way for users who did want to disable it, to do so. It was something I thought about posting to suggestions about but never did. *g*
Reply
And... why? If you want to block robots and spiders, surely you want to block robots and spiders? robots.txt prevents a lot of page loading (ie., first robots.txt is read, if that says not to read further, nothing further is read) and also protects from messed up HTML that renders the tags unparseable (if robots.txt says it is blocked, there's no need to load and parse the HTML to find out whether it's blocked).
There are of course robots and spiders that doesn't read robots.txt, but I wouldn't count on them to adhere to guidelines in either.
Reply
Reply
Leave a comment