Android, Palm, Python, Unicode etc

Aug 18, 2011 00:02

Migration from Palm to Android

I finally decided to use Android phone as my primary instead of my dearly loved Palm Treo and it required some efforts to migrate my existing data into a new form. What I need to migrate is address book and large collection of memos with misc info.
Address book is done using export of mac address book which is in sync with old phone into a business card format which could be imported into android using HTC Sense.
For memos I choosed Wikilin which is more or less what I want but data should be migrated.
A year ago or so when I was trying to find a memo replacement with local sync for android some smart guy suggested any wiki which stores plain text files together with subversion sync from SD card. The idea looked like nonsense to me at that time, but now I think this is one of the few options available with minimal effort without writing any android sync software. Wikilin fit this concept nicely.
I can have a simple mac client that will edit notes or i can use Chrome browsed plugin from for that.

Task definition

Output format of memo export is a set of directories by category containing individual files for each memo and named according to a first line of it. I used specially formatted tags at the end of each memo in addition to category to simplify search.
"Reverse engineering" of input format of Wikilin showed that all memos are stored in single directory with each memo titled according to its file name. Tags are stored in a special file which contain a single line for each file containing a comma separated list of strings where first string is a file name and subsequent ones are tags.

Why Python

I saw a number of python scripts in my life written for purposes of build automation. They looked more or less readable, but I never had a need to write them myself. I thought that would be a right task to give it a try. Other candidates were perl and lisp.
Perl, despite my occasional experience with it, always looked quite inhuman to me. Every time I started it again after a break I was struggling even with its rich syntax.
Lisp looked promising since I'm trying to learn it, but I have serious doubts I can roll out working product in few hours.

First impressions

I managed to write initial version that scans files and collects tags in two hours. That was very impressive for such a slowpoke as I am. Syntax looks very intuitive for writing basic tasks which searches and scan files and process their contents.
I found it funny to rely on indent to separate blocks but it has its point and I like it. Makes code compact actually. Working with regexps was obvious after java. I also liked the way members in collections and dictionaries are accessed. None of the perl nonsense and java verbosity. :)

Unicode surprises

Once the initial conversion script was finished it was time to try it. First problems arose in a form of cyrillics. Russian text was lost or unreadable. I had to explicitly handle unicode files and names. Defining root data paths as unicode made file scanning part happy. Next surprise was that files exported from palm were stored in UTF-16 little endian no BOM format which required explicit specification. I used text editor to reload file in different encodings to figure out the right one. Wikilin files are utf-8 which is a default behaviour for saving.

Normalized form

Once text encoding was supposedly fixed and memos successfully imported I tried to verify that all notes are readable I noticed that I can't find _some_ of notes with russian names. A little bit of search showed that memos with accented characters are nowhere to be seen when searching by tags. Accented characters in titles were replaced by squares, good sign that something is wrong with encoding. I can still see them when browsing full list. Surprisingly when I retagged unreachable entry in Wikilin I saw two lines with the same filename in tags file. But those strings were not equal. Then I remembered that unicode can have multiple encodings for the same character. It turned out that filename had accented characters as character and accent while tag file needed single character representation. To fix that, string should be normalized. Adding that fixed the problem of tags and now almost all memos became visible.

Final adjustments

Finally I had to fix filenames to conform to Wikilin's restrictions to special characters in memo names and replace them with underscores. Now my script can handle any notes that I had and also auto-numbered duplicate titles.

Conclusion

In a pair of two hour sessions I managed to write a single page script to convert text files and arrange them according to new layout, generate explicit and implicit tags and convert encodings. All that was done in a language I never used before with the help of language and library reference and a bit of googling.

Next steps?

I can try to write a two way conversion between address book .vcd entries that will preserve non standard fields between mac and sense.
SL4A has python support which allows you to script in python right on the device.

python, programming, android

Previous post Next post
Up