Cheap Search Service

Apr 10, 2021 14:42


I stood up an AWS ElasticSearch cluster a couple months ago. It had some mode that was supposed to be cheaper to run, based on usage. Well... It was not cheap to run. I didn't monitor it closely enough and basically flushed... an embarrassing amount of money down the corporate toilet. Oh, well. My bad.

So, now I've been investigating: is there a way I can have ElasticSearch-ish functionality without the prohibitive cost? After a little Googling, I found this blog post where someone tried to do this same thing. I am thinking that just using the serverless framework's plugin for keeping the lambda hot might be acceptable from my point of view for both cost and simplicity of implementation, so I'm thinking I will forgo the complexity of his compiling Java to native code to improve the cold-start time. But the basic premise is: Use Lambda, attach EFS, store index there.



So I spent a bit of time trying to learn Lucene (which is the open source library that ElasticSearch uses internally). Last night I was able to get all the basic capabilities that I wanted to mostly work. I still don't get the "Highlighter" add-on to show search results in the context where it occurs in the text... And I don't get "term vectors". There's all sorts of things I don't get. I have been finding it difficult to learn just from the Javadocs, even though they do have a lot of higher-level info. I think it is possible to learn that way, but it would take a lot of experimentation and fruitless searching. Most tutorials I've found online are for very old versions of the library. I eventually decided to just page through tutorial search results and look at the version of Lucene being used in each one, and this led me to this one for Lucene 8.x that was written in 2019. He has some good advise and covers a variety of subjects I'd like to learn more about. I may come back to it as a reference.

Now I'm trying to set up a Java Lambda to talk to an EFS file system. I found this guide useful to get up and running quickly with java/mvn/serverless. After that I think I'll try to expose sufficient functionality for me to index a new document and then search the current index. Making some progress. It's been equal measures of fun/interesting and frustrating. We'll see if I can get to a point where I have something viable to use for my website.

...

Finally got the Lambda setup such that I was able to create a file on the EFS file system and then when running the lambda again I found that it was still there. A sneaky permissions issue was solved after some searching via this post.

Previous post Next post
Up