Comments | lj_userdoc: [ext_sites] FAQ #50: My content is on a search engine and I don't want it there. What can I do?

freso in lj_userdoc

[ext_sites] FAQ #50: My content is on a search engine and I don't want it there. What can I do?

Jan 30, 2006 13:55

[...] Additionally, if you have a paid account, a robots.txt file will be added to your personalized subdomain (http://exampleusername.livejournal.com/).
( FAQ #50)

It would seem that this works for Free Accounts as well now, and a revision of the FAQ would thusly be in order. :)
(How does this work for communities?)

status-resolved, faq50, cat-ext-sites

Comments 5

decadence1 January 30 2006, 14:08:33 UTC

Mmm so it does. I didn't know that. Free accounts including communities now have the 'deny all' robots.txt file. It wasn't possible to disable it in the past for paid account subdomains; but the other forms of URL were primary URLs, back then. It might cause problems for users esp. communities who'd optimised or configured for searchability (yay for invented words!).
I don't know if it's deliberate or an oversight? A staffer did say on a comment that there didn't seem to be any reason they couldn't make robots.txt user-editable, in the future. I think that user previously had a paid account anyway though.

I guess the change needs to be documented somewhere though yeah.

mendel January 30 2006, 14:29:50 UTC

It wasn't possible to disable it in the past for paid account subdomains

I might misunderstand what you mean here, but if I did, so will others -- whether or not a paid account subdomain has a robots.txt file that blocks robots depends on whether or not the account owner configured the account that way. For instance, mine has no restrictions. No site functionality has changed, but users who used to have a www.livejournal.com URL (and thus had meta tags) now have a subdomain URL (and thus have a robots.txt file).

Any downloady-thing that respects robots.txt should also respect the equivalent meta tags; the only reason that there were both meta tags and robots.txt was that on the old www.livejournal.com/users/exampleusername URL, the number of users blocking robots would have made the sitewide robots.txt file too big. That user just needs to turn off the "block search engines" option while they download it, or tell their downloady-thing to ignore robots.txt.

The only documentation change necessary is what freso implied above, getting rid ( ... )

*blush* decadence1 January 30 2006, 16:12:01 UTC

I might misunderstand what you mean here, but if I did, so will others -- whether or not a paid account subdomain has a robots.txt file that blocks robots depends on whether or not the account owner configured the account that way. ...That user just needs to turn off the "block search engines" option while they download it, or tell their downloady-thing to ignore robots.txt.

Thanks for clarifying how it works, mendel. I didn't properly understand how the paid account subdomain robots.txt file worked. I mistakenly thought that there was one on every paid subdomain and it couldn't be disabled, at all.

I always had the 'block robots' option enabled for my journal, via editinfo.bml. It never ocurred to me that it configured the robots.txt file as well as the META tags! Although I'd have kept it enabled anyway, I used to think there should be a way for users who did want to disable it, to do so. It was something I thought about posting to suggestions about but never did. *g*

Re: *blush* freso January 31 2006, 10:44:19 UTC

It never ocurred to me that it configured the robots.txt file as well as the META tags! Although I'd have kept it enabled anyway, I used to think there should be a way for users who did want to disable it, to do so.

And... why? If you want to block robots and spiders, surely you want to block robots and spiders? robots.txt prevents a lot of page loading (ie., first robots.txt is read, if that says not to read further, nothing further is read) and also protects from messed up HTML that renders the tags unparseable (if robots.txt says it is blocked, there's no need to load and parse the HTML to find out whether it's blocked).
There are of course robots and spiders that doesn't read robots.txt, but I wouldn't count on them to adhere to guidelines in either.

bridgetester February 5 2006, 20:40:05 UTC

Well, it's done except for community and syndicated accounts. :/