Rendering XML readable

Nov 19, 2005 17:39

Currently, there are two formats in which you can download your journal: CSV (comma-separated values) and XML (extensible markup language).

Both of these formats have advantages and disadvantages. CSV has the advantage that it can be imported into spreadsheet applications (for example, users have reported success with newer versions of Microsoft Excel), from which it can generally be ported into text-editing or word-processing applications. XML has the advantage that, given a little bit of help, it can be viewed directly from your browser.

This tutorial aims to explain how to put your downloaded XML file into a browser-readable format. Please note that, short of converting XML into HTML (either by hand or using an automated script), there is no perfect way to do this; and older browsers are generally incapable of rendering XML. LiveJournal plans, in future, to make it possible to download your journal in other formats, such as HTML, which may be more convenient; for the moment, however, this appears to be the best way.

Step 1 is to obtain the XML file. FAQ #8 explains how to do this. Be sure to select XML as your output format (as opposed to CSV). Once you have obtained this file, save it to your hard drive. (Note: to do this, use your browser's "Save As..." feature, or alternatively use its "View Source" feature and use the resultant text file; if you are unsure how to do this, or encounter difficulties, consult your browser's documentation. Do not copy the code out of your browser window and paste it into a text editor, as that will not work properly.) Be sure to save it as an XML file (i.e., with the extension ".xml").

Step 2 is to edit the XML file slightly. To do this, open it in a regular text editor and add the line, right after the first line of the file (so that it will be the second line of the file).

Step 3 is only if you intend to view your downloaded journal using Microsoft Internet Explorer; if you intend to view it using any other browser, skip this step and proceed to step 4. (This step addresses a bug in Microsoft Internet Explorer; however, the XML file it produces will not work quite properly in other browsers.) In the XML file from Steps 1 and 2, replace every occurence of "&" with "&amp;", of "<" with "&lt;", of ">" with "&gt;", and of """ with "&quot;". Many text editors come with a "Find and Replace" utility, often with the ability to change all occurences of certain text at once. If you are not sure if your text editor has this feature, or if you are not sure how to use it, consult its documentation.

Step 4 is to create a CSS stylesheet. You create it in the same way as you would a normal text file (using a text editor or word processor), except that when you save it, you save it as a CSS file instead of as a TXT file (i.e., save it with the extension ".css" instead of with the extension ".txt."). Name your stylesheet "export.css", saving it in the same directory as your XML file from Steps 1-3, and give it the following text:

livejournal { display: block; background-color: #9999FF; color: #000000; } entry { display: block; border: 2px solid #000099; background-color: #EEEEFF; color: #000099; margin: 1em; padding: .5em; } itemid:before { content: "Entry #"; } itemid { display: inline; background-color: #000099; color: #EEEEFF; padding: 1px; font-size: 80%; } eventtime { display: inline; background-color: #000099; color: #EEEEFF; padding: 1px; font-size: 80%; } eventtime:after { content: " local time"; } logtime { display: inline; background-color: #000099; color: #EEEEFF; padding: 1px; font-size: 80%; } logtime:after { content: " LiveJournal time"; } subject { display: block; background-color: inherit; color: inherit; font-weight: bold; font-size: 120%; } event { display: block; background-color: inherit; color: inherit; border-left: 2px solid #000099; margin-top: .5em; margin-bottom: .5em; padding-left: 3px; } security:before { content: "Security level: "; } security { display: inline; background-color: #000099; color: #EEEEFF; padding: 1px; font-size: 80%; } allowmask:before { content: "Allowmask: "; } allowmask { display: inline; background-color: #000099; color: #EEEEFF; padding: 1px; font-size: 80%; } current_music:before { content: "Current music: "; } current_music { display: block; background-color: inherit; color: inherit; } current_mood:before { content: "Current mood: "; } current_mood { display: block; background-color: inherit; color: inherit; }

If you are familiar with CSS, you should feel free to edit this file to your liking.

Known issues:
1. Because of the way LiveJournal exports your entries, any markup in your entries will be displayed rather than applied; so, for example, if a given entry has a hyperlink in it, then you will see something like http://www.example.com/">example text in your entry when viewing it this way. (Note: In the case of Microsoft Internet Explorer, there is a bug which means that if you do not follow Step 3 above, then the markup will simply be neither displayed nor applied. If you follow Step 3 above, then the Internet Explorer will display the markup rather than apply it.)
2. LiveJournal does not export autoformatting, so linebreaks will appear only as spaces. You can change this slightly by adding a CSS declaration such as entry{white-space:pre}; however, if you have any long paragraphs, this will result in horizontal scrolling. Ultimately, short of editing the XML file manually to insert line breaks in long paragraphs, there is no way to address both of these issues at once.
3. There is a bug in Microsoft Internet Explorer that affects the aesthetics of your exported journal; it creates a white background around a smaller blue background. There is no way to alter the white background; however, if you like, you can lessen the effects by setting the blue background to be white. To do this, change the fourth line of your CSS file (export.css) from background-color: #9999FF; to background-color: #FFFFFF;.
4. There is a bug in Microsoft Internet Explorer that causes the different parts of the entry (current mood, time posted according to LJ, etc.) not to be labeled explicitly. There is no simple way to address this; however, you can edit the CSS stylesheet such that the different parts appear different (or not at all, using the property display: none; on any entry part).

Contributed by ruakh, erin, janinedog, adcott, arie, deslea, and technodummy.

~ readable xml

Previous post Next post
Up