Unit Testing HQL Scripts

Why is it so hard to find examples of unit testing HQL?

I do big data things with the help of Apache.

My current project uses Apache’s data warehouse software Hive which rests overtop of Hadoop, which seems to work well for all of our data mining needs. We pair Hive with the Hibernate ORM, so as a developer, I’m waist deep in .hql files most of my days.

“I need more hadoops!” is a common phrase in our office. - chuckles to self

I miss TDD so much.

I’m fresh off of a C++ project by way of a Java project. I’ve had strong mentors who were very influential my opinions and favorite development processes. I consider some of these processes to be vital to my wellbeing (there’s only a touch of hyperbole there). Example, my first day on this big data project, I wrote 3 table scripts, and NO TESTS. I wanted to cry myself to sleep that night because I felt like I was descending into utter chaos. I’m a purest when it comes to process - meaning, if there is a proven optimal way, I will never choose to be suboptimal, no matter how lazy I am on that day. Or maybe it’s a consequence of my laziness. Why should I re-discover the “correct” way every time I run into a problem, when I could shortcut with a simple heuristic like, “when in doubt, write a test.”

No tests?? This will not do!

I immediately took to the internets. I scraped them for information on unit testing HQL files. Apparently, this is not an in-demand ask from most developers, especially not PC developers. This is probably because most big data projects use a real development language like Scala, which has internal unit test support. As I could not change the project, I found me a solution that worked for our current project structure.

There’s a nifty opensource project out there called HiveQLUnit which I found to be the easiest to consume, and the most supported.

But, surely there were other options! How did you decide?

Apache does have a list of unit testing options.

I knew I didn’t want to test from the command line, I wanted something that had the potential to be integrated into a continuous deployment pipeline. That left HiveRunner and hive_test - which I couldn’t get to work on a Windows machine - Apache’s internal testing framework - for which I couldn’t find any examples or documentation - and finally HiveQLUnit. With crossed fingers, I dove into the user guides from the readme.md, and an hour later I had a working hql unit test!

~ wipes sweat off forehead ~

Written on April 20, 2018