Elasticsearch on AWS or How I learned to stop worrying and love the lucene index

Ahh, Elasticsearch, the cause of, and solution to, all of lifes problems.

I run a Logstash/Elasticsearch/Kibana cluster on EC2 as a application/system log aggregator for the web service I’m supporting. And it’s not been plain sailing. I have a limited AWS budget so I am somewhat restricted in the instances I can fire up. No cc2.8xlarges for me. So I was stuck with two m1.larges. And they struggled.

It was processing around 3 million documents for a total index size of around 4Gb per day. And sometimes it coped and sometimes it didn’t. I often found myself restarting the logstash and elasticsearch services around once or twice a week sometimes losing 7-9 hours of processed logs.

And the most frustrating thing? I had no idea what I was doing wrong. Had I misconfigured? Is it just that the instances were too small?

So I’ve upped my game a bit. Not without some trial and error. “Fake it til you make it” as those of us without an extensive background in Lucene indices and grid clustering are fond of saying.

But I think I’ve cracked it. And this may be a good lesson for people starting out with a set up like this.

And the most important thing? I’ve done my homework and put some effort into making my Elasticsearch config right.

transport.tcp.port: 9301

For now my problems seem to be mitigated. We’ll see how easy it is in future to scale the service as my user load increases.