Programming, development

Learn JQ the Hard Way, Part III - Filters

Other Posts

Simple Filters

In this section we introduce the most-frequently used feature of jq: the filter. Filters allow you to reduce a given stream of JSON to another smaller, more refined stream of JSON that you can then do more filtering or processing on on if you want.

How Important is this Post?

Filters are fundamental to using jq, so this post’s content is essential to understand.

We will cover:

  • The most commonly-used filters
  • Selecting values from a stream of JSON
  • Railroad diagrams, and how arrays cannot contain name-value pairs in JSON

Setup

Create a folder to work in, and move into it:

$ mkdir ljqthw_nv
$ cd ljqthw_nv

Now create a simple JSON document to work with:

$ echo '{"user1": "alice", "user2": "bob"}' > json_object

This file contains a simple JSON object with two name-value pairs (user1, and user2 being the names, and alice and bob being their values, respectively).

The Dot Filter

The concept of the filter is central to jq. The filter allows us to select those parts of the JSON document that we might be interested in.

The simplest filter – and one you have already come across – is the ‘dot’ filter. This filter is a simple period character (.) and doesn’t do any selection on the data at all:

$ jq . json_object

Note that here we are using the filename as the last argument to jq, rather than passing the data through a UNIX pipe with cat.

Arrays, Name-Value Pairs, and Railroad Diagrams

Now let’s try and create a similar array with the same name-value pairs, and run the dot filter against it:

$ echo '["user1": "alice", "user2": "bob"]' > json_array
$ jq . json_array

Was that what you expected? When I ran this I expected it to just work, based on my experiences of data structures in other languages. But no. Arrays in JSON cannot contain a name-value pair as one of its values – it’s not a JSON object, and arrays must be composed of JSON objects.

What can an array contain? Have a look at this railroad diagram, from https://json.org/:

Railroad diagram

The above diagram defines what an array consists of. Make sure you understand how to read the diagram before continuing, as being able to read such diagrams is useful in many contexts in software development.

Railroad Diagram
A railroad diagram is also known as a syntax diagram.
It visually defines the syntax for a particular language
or format. As your code or document is read, you can
follow the line,choosing which path to take as it splits.
If your code or document can be traced through the
'railroad', then it is syntactically correct. As far as I can
tell, there is no 'official' format for railroad diagrams
but the conventional signs can easily be found
and deciphered by searching online for examples.

The ‘value’ in the array diagram above is defined here:

Railroad diagram

Following the value diagram, you can see that there is no ‘name-value’ pair defined within a value. Name-value pairs are defined in the object railroad diagram:

Railroad diagram

A JSON object consists of zero or more name-value pairs, separated by commas, making it fundamentally different from an array, which contains only JSON values separated by spaces.

It’s worth understanding these diagrams, as they can help you a lot as you try and create and parse JSON with jq. There are further diagrams on the https://json.org site (string, number, whitespace). They are not reproduced in this section, as you only need to reference them in the rarer cases when you are not sure (for example) whether something is a number or not in JSON.

Our First Query – Selecting A Value

Now, back to our original document.

$ cat json_object
{
"user1": "alice",
"user2": "bob"
}

Let’s say we want to know what user2‘s value is in this document. To do this, we need a filter that outputs only the value for a specific name.

First, we can select the object, and then select by name:

$ jq '.user2' json_object

This is our first proper query with jq. We’ve filtered the contents of the file down to the information we want.

In the above we just put the bare string user2 after the dot filter. jq accepts this, but you can also refer to the name by placing it in quotes, or placing it in quotes inside square brackets:

$ jq '.["user2"]' json_object
$ jq '."user2"' json_object

However, with the square brackets and without the quotes does not work:

$ jq '.[user2]' json_object

This is because the [""] form is the official way to look up a name in an object. The simpler .user2 and ."user2" forms we saw above are just a shorthand for this, so you don’t need the four characters of scaffolding ([""]) around the name. As you are learning jq it may be better to use the longer form to embed the ‘correct’ form in your mind.

Types of Quote Are Important

It’s important to know that the types of quote you use in your JSON are significant to jq. Type this in, and think about how it is different to the ‘non-broken’ json above.

$ echo "['user1': 'alice', 'user2': 'bob']" > json_array_broken
$ jq '.' json_array_broken

What You Learned

  • The ‘dot’ filter
  • How to select the value of a name from an object
  • The various allowed forms for performing the selection
  • The syntactic difference between an object, an array, and a name-value pair
  • The types of quote are important

Exercises

1) Create another json_array file as per this post, but this time with two JSON objects (ie the same line twice). Re-run the queries above and see what happens to demonstrate that jq works on a stream of JSON objects.

Comments
Leave your Comment