SQL / CS question

Aug 08, 2008 14:36

GEEK QUESTION ( Read more... )

Leave a comment

jpmodisette August 9 2008, 14:45:24 UTC
This is getting tricky. What you want then is that there be N matching search_term fields in records with the same field_id, with the additional restriction that each of the keywords given only be used once. Contrary to what I said earlier, I don't think enforcing that the N matches be from different fields in the original table - it wouldn't stop either of those erroneous matches you mention.

It seems like you need a nested query: first find all matches to the first term, then take all the fields that have an entry_id that shows up in that set of matches, then from that reduced table find all matches to the second term, and further reduce the table to those fields that have an entry_id that also shows up in the second set of matches, etc. That would address your two cases there. It would still have the problem that "bob b" would match "bob smith" because both the "bob" and the "b" would match "bob".

Like:

SELECT DISTINCT entry_id
FROM SecondTable
WHERE entry_id IN ( SELECT entry_id FROM SecondTable WHERE search_term = 'bob' )
AND search_term = 'smith'

You would get an additional level of nesting for each search term.

Reply

jpmodisette August 9 2008, 14:58:43 UTC
If you ordered the search terms in order if decreasing length, then I think you could avoid the "bob b" / "bob smith" failure with

SELECT DISTINCT entry_id
FROM SecondTable
WHERE entry_id IN ( SELECT entry_id FROM SecondTable WHERE search_term LIKE 'bob%' )
AND search_term LIKE 'b%' AND search_term NOT LIKE 'bob%'

That would ALMOST work, and might be good enough. The only failure there is that "bob b" wouldn't match "Bob Bobbert".

Reply

jpmodisette August 9 2008, 15:06:54 UTC
I would go with the earlier one - you might occasionally get an extra match, but that beats missing matches. The NOT LIKE one just above could really fail spectacularly in some instances; "d d" would never match anything, for instance, whereas the earlier one would just match the same set of things for "d d" that it did for "d".

There may be some way to fix it with COUNT but I don't immediately see it.

Reply

halffast August 9 2008, 18:05:46 UTC
well, we have a snag of the database it's on not doing subqueries

but i dumped it to a database that does and that seems to work

also, what works is having a "full name" field (which we happen to have).

so it takes the earlier having count search and also does a match on the full name
so "dana w" gets matched as "dana%w%"

the side effect is that doesn't match "wilson danata"

which was a specification at one point, but might now be undesired.

but between a mix of which database needs to be used and whether the search should match in order, one of those two should work, until i figure out the next issue.

thanks

Reply


Leave a comment

Up