In this second part of the query language blog series, we look closely at the implementation of the lexer and parser.
The query language is mainly intended to identify/find targets to execute actions (attacks, checks…) against them. What are targets, though? For Steadybit, targets are observed hosts, Kubernetes workloads, API endpoints and more. Targets carry attributes, e.g., Kubernetes deployments carry the deployment name and the namespace in which they are deployed. When identifying and finding targets to attack or check, it is beneficial to have full access to all the attributes and boolean algebra.
Of course, there is no need to reinvent the wheel regarding query languages. There are many suitable and frequently used variants out there. Based on our experience and customers' feedback, we were looking for a syntax similar to
SQL where clauses, e.g., k8s.deployment=checkout AND k8s.namespace LIKE 'dev-%' or the
Lucene query language, e.g., k8s.deployment:checkout AND k8s.namespace:dev*
In the end, we settled for a variation of SQL where clauses. Now, to parsing this syntax!
ANTLR accepts lexer and parser grammars as input and generates lexers and parsers in various languages. Maven build plugins, IntelliJ IDE extensions, books, support articles, and more make ANTLR a good choice for simple and more advanced use cases.
The screenshot above depicts the IntelliJ IDEA ANTLR v4 plugin you can install through the marketplace. You can use the input field on the left to define a text that the plugin will parse. The image on the right side presents the parse tree (sometimes called abstract syntax tree). When working with an ANTLR-generated parser, you will work with the parse tree through an object hierarchy.
Lexer and Parser Grammars
Authoring lexer and parser grammars can be very complex. Much too complex for a blog post such as this. Lucky for us, our query language is based on existing query languages! Also, thanks to ANTLR's popularity, there are a lot of open source grammars we can use as inspiration! An SQL lexer and parser would have been our second choice since it comes with many additional capabilities we do not need. Consequently, we looked for an open-source Lucene query language lexer and parser that we could modify for our needs.
ANTLR has a handy collection of open-source grammars within their antlr/grammars-v4 repository. Among those, even an MIT-licensed Lucene query language lexer and parser! This enabled us to move more quickly and avoid common pitfalls, e.g., escaping, conjunction vs. disjunction precedence and parentheses logic. We adapted a few things here and there (using = instead of :), made the grammar more lenient (allowing lowercase AND and OR) and removed some syntax capabilities (number comparisons).
The snippet above shows the first few lines of our lexer's grammar. Most of it is still strikingly similar to the Lucene query language lexer grammar. The following snippet shows the first few lines of our parser's grammar. Again, most of it is still similar to the Lucene query language parser grammar.
Generating the Lexer and Parser Code
With the scariest work out of the way, we can now turn back to a common problem: Build definitions. We leverage Maven to build the component requiring the query language. Consequently, this is where we integrated the build steps.
The image above shows our query-language module. This module contains the grammars, build logic for the generated Java lexer and parser and some wrappers around the generated code. To generate the lexer and parser code, we use the antlr4-maven-plugin, as shown below.
This Maven build plugin generates additional sources within the target directory, as shown in the picture below. You will learn how to use these generated sources in the next section.
Usage in Java
What you do with the generated lexer and parser is up to you. Our main objective is to translate the generated parse tree into an object hierarchy compatible with the existing code base. To achieve this, we use the generated lexer, parser and visitor, as the code snippet below alludes.