How to backup one's journal?

Nov 17, 2012 17:05

I searched for a solution which saves in a format able to reimport (LJ-script based sites); have Ubuntu Linux. Tested some programs. This method worked for me, right away (as it's Python, should work under Windows, too):

  1. download-page of ljmigrate / save the raw version of all files; 'ljmigrate.py' and 'ljmigrate.cfg.sample' are the ones then engaged, rename the latter to end with '.cfg' (=remove '.sample')...

  2. and edit it to fit your needs. if you just want a backup without migration elsewhere, you can copy/paste my version of the config-file over the content, and just replace 'YOUR_...' by ur username/password:

    [settings]
    # migrate to another site?
    # Set to True if you want to repost entries, False if not.
    migrate: False
    # You can restrict migration to just the entries that have one of the following
    # tags. Use commas to separate tags in the list. Comment this out if you want
    # to migrate everything.
    #migrate-these-tags: example 1, example tag 2, third tag I want to copy
    # generate html for each entry? True or False
    generate-html: True
    # If you're migrating a community, do you want to grab your own posts or everybody's?
    migrate-community-posts-by-others: False

    [proxy]
    # This section is optional. Use it if you use a proxy to do your web browsing.
    # Leave it commented out if you don't.
    #host: localhost
    #port: 8000

    [source]
    # The host of the journal you're backing up.
    server: www.livejournal.com
    # Your user name.
    user: YOUR_USERNAME
    # Your password. Will not be sent in the clear; this tool uses the challenge
    # response authentication mechanism.
    password: YOUR PASSWORD
    # communities: sourcecomm1 sourcecomm2

    [destination]
    # Where you're migrating to, if you are. You can omit this section or leave
    # it commented out if you're not copying your journal to another site.
    #server: insanejournal.com
    #user: myotheruser
    #password: myotherpassword
    # communities: destcomm1 destcomm2

    # [nuke]
    # option section for the nuclear option; see README for details
    # server: http://insanejournal.com
    # user: myotheruser
    # password: myotherpassword
    # community: comm_to_nuke (optional)

  3. make 'ljmigrate.py' executable, in console/current directory that is e.g.:

    sudo chmod 700 ljmigrate.py
    sudo chown YOUR_HOME_USERNAME ljmigrate.py

  4. with both files lying in the same directory, launch the file from the command line:

    ./ljmigrate.py

  5. The program will save the backup within the subdirectory "www.livejournal.com", and tell when it's done (usually after a few minutes/couple of hundred entries; in case you didn't get back to the prompt, you do so with ENTER).
*

The html-version compiled by this tool doesn't convert the hyperlinks to be surfable, offline (instead they point to the posts' URL at livejournal.com), and it seems there's no tool for linux (can't tell>Windows)/webservice yet to avoid broken links (when entries point to each other in one journal).

If your site contains many of these cross references, it can be helpful to fetch an offline copy of public entries e.g. with wget, so that one is able to lateron research easily where the hyperlink of interest pointed. However this uses bandwidth, so shouldn't be done often (I do it once a month at most), and with settings not demanding to the server. Wget requires you to know what you're doing/read the manual, first, and watch the process/disk usage. You may also want to exclude galleries (scrapbook).

I have this command line for most homepages I mirror (not specifically), it leaves appr. 1.5 seconds between each task, and is for small sites:

wget -m -r -np -w 2 --random-wait -l 5 -E -k -p http://YOUR_USERNAME.livejournal.com/

"-l 5" means the number of levels/link-depth (=fetches current page, the pages linked thereon, others linked on these, and 2x so on); has to be set, specifically; if the archive/calendar is linked on the recent entry page (or one doesn't have many posts/50 per recent entry page), "4" already should be enough.

computer

Previous post Next post
Up