SQL / CS question

Aug 08, 2008 14:36

GEEK QUESTION ( Read more... )

Leave a comment

jpmodisette August 8 2008, 22:42:43 UTC
I'm not entirely sure I understand your problem. Let me paraphrase and tell me if I am right: you have taken all the entries in the searchable fields and stuck them in one bigger table with a single field, then you index and search that, so that if the search is "first name = Dana OR last name = Watanabe" it turns into a search for "value = Dana" and a search for "value = Watanabe", which is fast, but then there is no way to search for "first name = Dana AND last name = Watanabe"?

Reply

halffast August 9 2008, 00:59:51 UTC
there is one search field and lots of columns to search (name, dept, email, etc), but simplified to two columns

someone searches "dana watanabe"

and i want the results of "first name = Dana AND last name = Watanabe" OR "first name = Watanabe AND last name = Dana"

and i want this in SQL not some homebrewed database like i know you would do. i've already suggested that. :)

Reply

jpmodisette August 9 2008, 01:31:12 UTC
I think something like this will do it (note I am not a SQL guy, but I can use google):

SELECT entry_id
FROM SecondTable
WHERE search_term IN ( 'Dana', 'Watanabe' )
GROUP BY entry_id
HAVING COUNT(*) >= 2

That says 1. make a list of all records where search_term is either Dana or Watanabe, 2. sort those records by entry_id, and 3. only list the records where the number of records with than entry ID is >= 2 (I said ">=" instead of "=" so we wouldn't throw out your entry if you also lived on Dana Street.

Reply

jpmodisette August 9 2008, 01:31:34 UTC
)

Reply

jpmodisette August 9 2008, 01:32:25 UTC
Ok the "I can use google" comment was unnecessary - but with this user icon I feel like I have a certain standard of smugness to maintain.

Reply

jpmodisette August 9 2008, 01:59:01 UTC
This will also return entries with first name Dana and last name Dana, incidentally. I bet you could get around that by adding a third field to SecondTable, "field", and then doing some sort of uniqueness test on field to throw out rows that don't differ in either field or entry_id.

Reply

halffast August 9 2008, 03:27:01 UTC
well data in the fields repeats, actually, and oddly, so they would actually be uniqued before putting into that index table.

however, i do think that's the exact thing i was looking for. thanks.

i might still be stuck out of SQL mode, but what'd you google?

Reply

jpmodisette August 9 2008, 04:02:39 UTC
I had to google a bunch of stuff, to figure out all the basic SQL commands - I remembered SELECT so I started with "SQL SELECT" and then kept googling commands I saw in examples until I understood how to write a query; then I happened across the count() function so I googled SQL count and looked at some examples related to that and found one that used HAVING, and then I saw how to do what you wanted.

Reply

halffast August 9 2008, 14:32:56 UTC
the case you mentioned is actually the problem, but it's more because we want "chris" to match "christopher" and we want to be able to match a search for, say "dana w", so we wildcard the end of all search words.

which means that both "tom tomkins" would match on "tom billings"
and "bob bradley" would match on "tom b"

really, you want to be able to say "match this term, but only once per id". i'm not sure how to exclude records by use of a third field like you're mentioning, though

Reply

jpmodisette August 9 2008, 14:45:24 UTC
This is getting tricky. What you want then is that there be N matching search_term fields in records with the same field_id, with the additional restriction that each of the keywords given only be used once. Contrary to what I said earlier, I don't think enforcing that the N matches be from different fields in the original table - it wouldn't stop either of those erroneous matches you mention ( ... )

Reply

jpmodisette August 9 2008, 14:58:43 UTC
If you ordered the search terms in order if decreasing length, then I think you could avoid the "bob b" / "bob smith" failure with

SELECT DISTINCT entry_id
FROM SecondTable
WHERE entry_id IN ( SELECT entry_id FROM SecondTable WHERE search_term LIKE 'bob%' )
AND search_term LIKE 'b%' AND search_term NOT LIKE 'bob%'

That would ALMOST work, and might be good enough. The only failure there is that "bob b" wouldn't match "Bob Bobbert".

Reply

jpmodisette August 9 2008, 15:06:54 UTC
I would go with the earlier one - you might occasionally get an extra match, but that beats missing matches. The NOT LIKE one just above could really fail spectacularly in some instances; "d d" would never match anything, for instance, whereas the earlier one would just match the same set of things for "d d" that it did for "d".

There may be some way to fix it with COUNT but I don't immediately see it.

Reply

halffast August 9 2008, 18:05:46 UTC
well, we have a snag of the database it's on not doing subqueries

but i dumped it to a database that does and that seems to work

also, what works is having a "full name" field (which we happen to have).

so it takes the earlier having count search and also does a match on the full name
so "dana w" gets matched as "dana%w%"

the side effect is that doesn't match "wilson danata"

which was a specification at one point, but might now be undesired.

but between a mix of which database needs to be used and whether the search should match in order, one of those two should work, until i figure out the next issue.

thanks

Reply


Leave a comment

Up