Jan 09, 2006 18:57
Tagging! It is popular! For my current project (in TurboGears) I decided to do a tag-based keywording system. I wanted handling the tags to be as simple as possible, to the point of just feeding an object a string and have it chop it up for me and manifest the list of tags as a string. I made a simple keyword class:
class keyword(SQLObject):
class sqlmeta:
defaultOrder="keyword"
keyword = StringCol(length=64, alternateID=True, unique=True)
documents = RelatedJoin('document')
And then you can add the following methods to your document SQLObject in the model to allow them to be tagged by setting the keywordstring:
def _set_keywordstring(self, keywordstring):
import sets
p = sets.Set()
for kw in self.keywords:
self.removeKeyword(kw)
for word in keywordstring.replace('/', ':').split(' '):
word = word.lower()
kw = None
try:
kw = keyword.byKeyword(word)
except sqlobject.SQLObjectNotFound:
kw = keyword(keyword=word)
if not word in p:
self.addKeyword(kw)
p.add(word)
def _get_keywordstring(self):
return ' '.join(map(lambda x: x.keyword, self.keywords))
Then you can make documents and set document strings!
myDocument = document(title="Some document")
myDocument.keywordstring = "web programming awesome"
for document in keyword.byKeyword("awesome").documents:
print "Document: '%s', keywords: %s" % (document.title, ' '.join([kw.keyword for kw in document.keywords])) # "Some document" should be in here
Obviously, you can extend this to other things as well: add a links = RelatedJoin('link') to your keyword and add a keywords = RelatedJoin('keyword') in your link object definition and you're golden.
Other potential useful things: set up comma-separated tag fields rather than del.icio.us-style space separated ones, using [x.strip() for x in somestring.split(',')]. Using shlex for smarter tokenizing. Also, if you don't like alphabetized tags, or want a tag to appear multiple times, or tags to maintain case, well, that's all possible but cutting out the list comprehension/set trickery in the _set_keywordstring method. I thought an all-lowercase, one-time-use set of tags was unambiguous and less confusing than a more freeform system for the task at hand. Also, note that any tag with a forwardslash has it replaced with a colon. This was kind of important because I have URLs that look something like /keyword/whatever/ and a tag with a forwardslash would break the URL pattern.
One thing I cut out of my example code: deleting tags that are no longer in use by anything. This is simple enough with a single DELETE statement, something like DELETE FROM keyword WHERE id NOT IN (SELECT DISTINCT keyword_id FROM document_keyword) would work. Note if you're using more than one thing tagged, like links, you can do something like DELETE FROM keyword WHERE id NOT IN (SELECT DISTINCT keyword_id FROM document_keyword UNION SELECT DISTINCT keyword_id FROM keyword_link) and just keep chaining UNIONs to include all your pertinent tagged tables.