Export

Jun 11, 2011 13:11

Now Semagic can (again) export all entries. Now they are exported with comments and embedded pictures. It is available via Links/Synchronization. To prevent overloading of LJ servers, it adds 1 minute pauses after each request (of 100 entries or 1000 comments). Also entries and comments are kept in two single files to optimize work with large ( Read more... )

new version

Leave a comment

kate_nepveu June 11 2011, 14:26:10 UTC
Thanks, this is really cool.

However, when it says it's "Downloading comment metadata starting from 1," it immediately returns a "Download comment metadata error."

(I do cross-post to Dreamwidth with comments turned off here, but I have years of LJ entries with comments.)

Reply

quirrc June 11 2011, 14:44:24 UTC
Can you use a packet sniffer to see what you get from server? http://www.wireshark.org/download.html
Install, in the menu Capture/interface select interface, press start, select in Semagic to synchronize only comments and make a request. Then press Capture/Stop, locate request GET /export_comments.bml?, right -click and select follow TCP stream and see server reply, in blue. It whould be like

HTTP/1.1 200 OK
Server: GoatProxy 1.0
Date: Sat, 11 Jun 2011 14:37:26 GMT
Content-Type: text/xml; charset=utf-8
Connection: keep-alive
X-AWS-Id: ws17
Content-Length: 219
X-Varnish: 2052710167
Age: 0
X-VWS-Id: bil1-varn19
X-Gateway: bil1-swlb05

11197

i.e. it contains maxid, comments data, usermap. That error that you see means it does not contain maxid. Maybe for you the reply is broken on LJ side (meybe you have underscore in the name or something), I will report it but provide info with your exact server reply.

Reply

quirrc June 11 2011, 14:55:29 UTC
also try to login in IE selecting to remember me, select in sychronization options to use cookie from IE. Maybe that is an authorization error because Semagic generates cookie manually, maybe it uses wrong format.

Reply

kate_nepveu June 11 2011, 15:46:15 UTC
quirrc June 11 2011, 15:58:01 UTC
302 means redirect to an error page for wrong autorization. can you manually open in the browser in loggen in state http://www.livejournal.com/export_comments.bml?get=comment_meta&startid=1

Semagic loads this page. If it is empty or redirect to error page that is LJ error for sure.

Reply

kate_nepveu June 11 2011, 16:00:01 UTC
I get:

(Hmm, does pre work in LJ comments? Let's see (answer: no):

19379

10001

Then lots of comments, then usermaps.

Reply

quirrc June 11 2011, 16:07:39 UTC
So that is a Semagic or system error. Please turn on sniffer, open that page in the browser, better in IE (logged in with remember me checked) and save request, not server reply, and then connect via semagic with IE cookies (in Synchronization Options, not in connection settings in login window) and save request again. Then I will compare request from the browser that returns correct results and from semagic.

Reply

kate_nepveu June 11 2011, 18:17:46 UTC
quirrc June 12 2011, 01:29:58 UTC
From what I see it does not contain cookie at all. Could you also provide request for Semagic without IE cookies? I need the first request, that returns 302, to GET /export_comments.bml, not GET /?returnto=%2Fexport_comments.bml

Reply

kate_nepveu June 12 2011, 01:40:37 UTC
quirrc June 12 2011, 01:57:31 UTC
Do you have winhttp.dll in your system? Semagic uses either wininet that is part of IE to send request with cookies from IE, or winhttp for manual cookies. For winhttp, you do not see GET request at all, only session is generated. That may mean that you do not have winhttp it in your system. Does search for a file winhttp.dll find something? Also when you login in LJ in one instance of IE, then open another instance (via icon), it it also in logged in state? Cookies are either global or per process, Semagic can access only global cookies. If another instance is not logged in that means you run IE either in protected mode or it is some system error or you did not check remember me.

Reply

kate_nepveu June 12 2011, 02:03:09 UTC
1) I'm running Windows 7 and winhttp.dll is in C:\Windows\System32 ;

2) I opened an instance of IE, was logged in there, and opened another and was also logged in there.

If this is looking like some weirdness specific to me, then I hate to take up any more of your time.

Reply

quirrc June 12 2011, 02:25:31 UTC
That seem to be a problem with httponly cookies. ljsession cookie is marked httponly and is not send with Semagic requests, only with browser requests. That is system error for sure unless there are some IE settings that may affect it (protected mode or something). You can try to clear IE cookies cache and login in IE again. Also try to download comments for another user, on dreamwidth etc.

Reply

kate_nepveu June 12 2011, 02:31:13 UTC
Ah! Semagic defaulted to using IE as a proxy in the connection settings; telling it not to do that has fixed the problem (at least, it's preparing to download text of, eek, quite a lot of comments now).

Thank you so much for taking the time to help me track this down. I doubt I would have found it on my own.

(FYI: Semagic won't download from Dreamwidth if posts have been crossposted; it doesn't know how to interpret the extra information DW puts in.)

Reply

quirrc June 12 2011, 03:06:59 UTC
So you selected direct connection? Did you set IE cookies in Synchronization Options?
With crossposting, that is a DW error, I submited a support request.

Reply

kate_nepveu June 12 2011, 03:18:59 UTC
Direct connection, no IE cookies, and downloading like crazy. Yay!

Here's the information I turned up a while ago re: trying to deal with DW-auto-crossposted things in Semagic:

http://dw-news.dreamwidth.org/12509.html?thread=1324509#cmt1324509

Reply


Leave a comment

Up