New GIN opclass for hstore (Faster and smaller) !: obartunov

obartunov

New GIN opclass for hstore (Faster and smaller) !

Apr 25, 2014 17:26

We feel a bit uncomfortable after cancellation of nested hstore project for jsonb sake,so we decide to release new hash-based GIN-opclass (from jsonb) for hstore, which greatly improves the performance of @> operator (we expect an order of magnitude). We expect it should works for 9.2+ postgres, but you can test it for older releases too. Patches are welcome. Please, download it from https://github.com/akorotkov/hstore_ops and sent us feedback, so we could officially release it. This is a separate extension, so you need contrib/hstore already installed.
UPDATE: It is has no problem with long keys (values) !

Here is a short example:

create extension hstore;
select hstore(t) as h into proc from pg_proc t;
create index hstore_gin_idx on proc using gin(h);
=# explain analyze select count(*) from proc where h @> 'proargtypes => "2275"';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=30.97..30.98 rows=1 width=0) (actual time=0.201..0.201 rows=1 loops=1)
-> Bitmap Heap Scan on proc (cost=20.02..30.96 rows=3 width=0) (actual time=0.094..0.196 rows=76 loops=1)
Recheck Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Rows Removed by Index Recheck: 93
Heap Blocks: exact=50
-> Bitmap Index Scan on hstore_gin_idx (cost=0.00..20.02 rows=3 width=0) (actual time=0.082..0.082 rows=169 loops=1)
Index Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Planning time: 0.067 ms
Execution time: 0.222 ms
(9 rows)

Now, we install hstore_ops

create extension hstore_ops;
create index hstore_gin_hash_idx on proc using gin(h gin_hstore_hash_ops);
=# explain analyze select count(*) from proc where h @> 'proargtypes => "2275"';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18.97..18.98 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
-> Bitmap Heap Scan on proc (cost=8.02..18.96 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Recheck Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Heap Blocks:
-> Bitmap Index Scan on hstore_gin_hash_idx (cost=0.00..8.02 rows=3 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Planning time: 0.067 ms
Execution time: 0.032 ms
(8 rows)

One can see considerable improvement from 0.222ms to 0.032 ms, but the table proc has only 2744 rows and we need more results from hstore users.

Also, we added support of ?, ?|, ?& operators to gin_hstore_hash_ops (jsonb lacks them), but we didn't tested it and, again, we need your help.

Here is size comparison:

=# \dt+ proc
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+-------+----------+---------+-------------
public | proc | table | postgres | 1616 kB |
=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+---------------------+-------+----------+-------+--------+-------------
public | hstore_gin_hash_idx | index | postgres | proc | 232 kB |
public | hstore_gin_idx | index | postgres | proc | 616 kB |
(2 rows)

Notice, that new opclass is not just faster, but also several times less !

hstore, pg, pgen