Prefix trees are known as an alternative implementation for dictionaries/maps. Applied like this, they sometimes significantly outperform hash tables. In particular, it works well when key space consists of thousands not-very-long strings. I saw this in action as a part of a smart order router code at a big sell-side firm where the keys were US
(
Read more... )
Comments 23
Reply
Reply
Have you gathered any statistics on how much the prefix tree is faster than dictionary search?
As for Aho-Corasick, I think it should be possible to build a big regular expression along the lines of "(C\+\+)|(Java)|(\.Net)|(Data Integration)", match against it, and then iterate over the list of matched subgroups to assign the resume to the corresponding bucket(s). E.g. if the text matches subgroup 0 and subgroup 2, it goes to "C++" and ".Net" buckets. I am quite positive that the algorithm used in RegExes is similar to Aho-Corasick (a linear state machine), but you won't need to code it by hand.
Reply
Reply
Reply
Reply
Reply
Reply
Reply
Вот на Tommy Carruthers иногда смотрю, по вашей рекомендации.
Reply
Leave a comment