Setting up Sphinx indexation for ODBC source

Jan 05, 2011 14:56

First: you have to install either unixodbc, either iodbc libraries - and also development packages for them. I've chose the unixodbc.
On RHEL you can found the unixodbc and unixodbc-dev rpms in the "Server" folder of the same disk - and installed these libraries.

Then I have taken the sphinx sources and build it with:

./configure --prefix=/home/sm/sphinx --with-unixodbc --enable-id64 --without-mysql
make -j3
make install

Then I've downloaded the freetds from http://www.freetds.org/ (I just used "stable release" ftp://ftp.ibiblio.org/pub/Linux/ALPHA/freetds/stable/freetds-stable.tgz)

I've unpacked it and installed (no caveats, simple ./configure && make && make install).
Also it could be configured for specific protocol version, for example:

./configure --with-tdsver=8.0
make
make install

According to FAQ on freetds homepage, the 8.0 version is for MS SQL server 2000. The MS SQL 2005 works well with 9.0. I don't know about your version - so, simple note this possible flag on your build.

May be you are going to use another ODBC driver to work with MS SQL (if you know, the unixodbc/iodbc is only the ODBC environment, and you have to install the actual ODBC driver also. For MS SQL there are some commercial trial drivers, and also freetds).

For testing purposes I've installed the MS SQL Express on the second VM. Also I've set up this MS SQL to wait the remote connection (since by default it uses shared memory, no TCP) - and I've created the test database with name "my_test". In this database I've created the only table "tst_table" with three fields: "ID", "title" and "text". The first one is integer, the rest two are textual. I've put several records into the test table (7, to be precise). This Vm in virtual network has the IP 192.168.1.101, and MS SQL is listen the port 1433. Also I've assigned the password "1234" for the user "sa" (which is by default the MS SQL Server administrator). So, this is a short description of the working server I've targeted to index from.

I've added the server connection to /usr/local/etc/freetds.conf

[MyMSQL]
host = 192.168.1.101
port = 1433
tds version = 8.0

You can see the example of such description in this dist file itself

Then - test your MS SQL connection:

----------------
[alexey@rhel ~]$ tsql -S MyMSQL -U sa
locale is "en_US.UTF-8"
locale charset is "UTF-8"
Password:
1> select * from syscolumns
2> go

bla bla bla (a lot of result lines)
-----------------

Note that I've used the "MyMSQL" as the server name - this is the one I've assigned in freetds.conf. Since all is ok, the tds is working - let us add the freetds into the list of available ODBC drivers.
Create the text "tds.driver.template", whith this content:

[FreeTDS]
Description = v0.82 with protocol v8.0
Driver = /usr/local/lib/libtdsodbc.so

And register the driver in the system:

# odbcinst -i -d -f tds.driver.template

Actually you could not use the odbcinst and directly add the lines into /etc/odbcinst.ini. However, using odbcinst is the right way, recommended in documentation, since it could be used in scripts.
Then create the data source, which uses this driver (FreeTDS). Again create the text file "tds.datasource.template", contains:

[my_sample]
Driver = FreeTDS
Description = my_test
trace = No
Servername = MyMSQL
Database = my_test

Notes here: the Driver points to the name which we just descripted in the previous template and registered in /etc/odbcinst.ini. The ServerName points to the name we devined in freetds.conf above. And - the Database points to the name of the database on remote server. Since I've created the database "my_test" for test purposes on my test server - so, I've included exactly this name.

We can then register this source with the command:

$ odbcinst -i -s -f tds.datasource.template

Actually it will just merge the template into the ~/.odbc.ini file. And in the case of clean system (as I have) - it actually copies this file there. Note, that this is local datasource, which is available only to current user.

Now, test the connection with this dsn:

[alexey@rhel ~]$ isql -v my_sample sa 1234
+---------------------------------------+
| Connected!                                |
|                                                      |
| sql-statement                             |
| help [tablename]                       |
| quit                                              |
|                                                      |
+---------------------------------------+
SQL>

Notes: "-v" is simple for verbosity. Since setting up of connection could waste much of your time, this is good to know what is happening - to google the errors than, for example. The very often error here is authentification. Since on MS it is popular to use "intergated system authentification", it could be set up by default. But FreeTDS has nothing to do with such authentification, and so you need to set up your MS SQL to authentificate itself, or use both types of authentification. The parameters of isql are the 1) name of DSN (this is exactly the one we defined in square brackets on the top of .odbc.ini; 2) the user UID (I've just used "sa" - the MS SQL server administrator); and 3) the password for this user. I've set "1234" since I've set this password for "sa" on my test server.

All works?
Well. I've moved the dsn definition from the local (~/.odbc.ini) file into the system-wide (which is /etc/odbc.ini). You could just

# cat ~/.odbc.ini >> /etc/odbc.ini

, or other way copy the whole dsn section from the local file to the system one. Also - you can delete your local file then, or just delete the section - to avoid confusion of two same-named sections in future.

After such moving it sounds well to run the isql again - to check that you havn't broken something.
So, the ODBC connection works, and you are ready to set up your sphinx now!
As I mentioned above - I have the home folder for my instance of sphinx as /home/sm/

First - lets make a copy of sphinx-min.conf.dist into sphinx.conf:

$ cd /home/sm/sphinx/etc
$ cp sphinx-min.conf.dist spinx.conf

Then - edit the sphinx.conf. You need at least edit the "source" section, to teach the sphinx about your datasource. As I've described above, in unimportant section, I've the test table, named "tst_table" with 3 columns - id, title and text. So, this is my sphinx.conf source section:

source src1
{
type = odbc
odbc_dsn = DSN=my_sample;UID=sa;PWD=1234;
sql_query = SELECT id, title, text FROM tst_table
sql_query_info = SELECT * FROM tst_table WHERE id=$id}

Note: for sphinx version < svn r2446 you also have to add mandatory fields (which actualy will not be used, but necessary to pass the syntax checker)

source src1
{
type = odbc
odbc_dsn = DSN=my_sample;UID=sa;PWD=1234;
sql_host = fake
sql_user = fake
sql_pass = fake
sql_db = fake
sql_query = SELECT id, title, text FROM tst_table
sql_query_info = SELECT * FROM tst_table WHERE id=$id}

All is done! Lets run the indexer!

$ /home/sm/sphinx/bin/indexer --all
Sphinx 1.10.1-id64-dev (r)
Copyright (c) 2001-2010, Andrew Aksyonoff
Copyright (c) 2008-2010, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/home/sm/sphinx/etc/sphinx.conf'...
indexing index 'test1'...
collected 7 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 7 docs, 97 bytes
total 0.082 sec, 1175 bytes/sec, 84.81 docs/sec
total 2 reads, 0.000 sec, 0.1 kb/call avg, 0.3 msec/call avg
total 6 writes, 0.001 sec, 0.1 kb/call avg, 0.2 msec/call avg
That's all!

fts, odbc, sphinxsearch, iodbc, unixodbc, sphinx

Previous post Next post
Up