The JEQL Language Specification

Lexical Structure


The lexical structure of the JEQL language forms the basis of parsing and compilation.

Basic Structure

JEQL scripts are described by a stream of characters. The stream is parsed into a sequence of tokens, which forms the input to the JEQL parser. Tokens are determined by specified patterns of character sequences. Where the distinction between tokens might be ambiguous, they must be separated by whitespace.

Whitespace

Whitespaces consists of one or more whitespace characters. Whitespace characters are spaces, tabs, and newlines. Whitespace serves to separate tokens which otherwise could not be distinguished, and also serves to improve the readability of scripts. It has no other syntactic purpose, which allows JEQL scripts to be freely formatted to improve readability.

Comments

Comments allow free-format documenting of scripts. Thye also allow preventing sections of scripts from being parsed. Comment syntax follows the Java style. There are two types of comments:

Tokens

Tokens are the basic symbols of the JEQL scripting language. Types of tokens include:

Numeric Literals

Numeric literals allow encoding numeric values in the script.

For clarity of reading, "_" characters are allowed to be used anywhere in numeric literals except the first digit. For example, this allows 1,000,000 to be entered as "1_000_000".

JEQL provides the following numeric types:

Type Examples
Integer 0
123
1_000_000
Double Precision Floating Point 0.0
123.456
123_456.456_123
1.0E10
123.456E-3
123_456.789e-3

String Literals

String literals allow encoding textual data directly in the script.

String Quoting

To allow ease of including arbitrary data, String literals support flexible quoting. Generally String literals are delimited by quote characters. Both " and ' are allowed as paired opening and closing quote characters. Within a String literal delimited by a quote character, the character must be escaped if it occurs in the literal.

For example, the string He said "It's ready" can be encoded as

"He said \"It's ready\""  or  'He said "It\'s ready"'
String literals are described in three different ways, which allow different ways of specifying character data.

Escaped Strings

Escaped strings are similar to Java-style strings. Their syntax is:
    "string"  or  'string'
Escaped strings support a set of escape codes, which allow representing non-printing characters. The escape character is the backslash '\'. The escape sequences provided are:
\ b                      backspace       BS  \u0008
\ t                      horizontal tab  HT  \u0009
\ n                      linefeed        LF  \u000a
\ f                      form feed       FF  \u000c
\ r                      carriage return CR  \u000d
\ "                      double quote    "   \u0022
\ '                       single quote   '   \u0027
\ \                      backslash       \   \u005c

Raw Strings

Raw strings do not support escape codes. This allows using the backslash character in the string without escaping it. These are useful for specifying strings which which make substantial use of backslashes as their own quoting mechanism (for example, Regular Expression patterns).

Raw string syntax is:

    \"string"  or  \'string'
An example is the following use of a rich string to clearly express a complex RegEx pattern for recognizing floating-point numbers:
    numPat = \"\d*\.\d*([Ee]?[\+\-]?\d+)?";
    Assert RegEx.matches("123.45", numPat);

Rich Strings

Rich strings provide multi-line strings and variable substitution. Rich strings support the same set of escape codes as Escaped Strings.

Rich string syntax is:

    $"string"  or  $'string'
Rich strings allow strings to be spread over multiple lines. The EOL character(s) are included as part of the string value. This allows readable formatting of long complex string values such as SQL statements.

The following example shows a multi-line string.

s = $"This is a
multine
string";

Rich strings support in-line variable substitution. Variables are indicated by the following syntax:

To represent an actual $ character it must be escaped: \$. The final value of the rich string is determined by substituting the current value of the variable(s) in the string. Variables include both top-level variables as well as columns in select expressions.

The following example shows variable substitution and literal $ signs.

word = "This";
letter = "t";
Assert $"$word is a \$${letter}est" == "This is a $test";

Boolean Literals

Boolean literals allow encoding boolean data in the script. The literal values supported are:
  true    false

Geometry Literals

Geometry literals allow directly specifying geometry values in the script. Geometry literals follow the OGC WKT specification. They allow specifying Points, LineStrings, Polygons (with holes), MultiPoints, MultiLineStrings, MultiPolygons, and GeometryCollections.

A (non-standard) Box literal syntax is also provided. BOX (x1 y1, x2, y2) is equivalent to POLYGON (x1, y1, x1, y2, x2, y2, x2, y1, x1, y1).

Examples of geometry literals are:

POINT(1 2)

LINESTRING(1 2, 3 4)

POLYGON ((50 70, 70 70, 70 50, 50 50, 50 70))

MULTIPOINT ((50 50), (50 100), (100 100))

MULTILINESTRING ((0 0, 50 50),
  (0 50, 50 100))

MULTIPOLYGON (((50 70, 70 70, 70 50, 50 50, 50 70)),
  ((20 50, 50 50, 50 20, 20 20, 20 50)))

GEOMETRYCOLLECTION (POINT (20 100),
  LINESTRING (0 0, 50 50),
  LINESTRING (0 50, 50 100),
  POLYGON ((80 70, 100 70, 100 50, 80 50, 80 70)))

BOX (0 0, 100 100)