At Container Solutions, we spend a significant amount of engineering time wrangling JSON requests and responses to and from various APIs. While traditional text-processing tools such as grep, sed, and awk can go some of the way to extracting and transforming information from these documents as a one-off, when you want to do this reliably and programmatically, you need a tool that is consistent and rigorous. jq is that tool.
This jq
series has been written to help users to get to a deeper understanding and proficiency in jq
. It doesn’t aim to make you an expert immediately, but you will be more confident about using it and building your knowledge up from that secure base.
You may well have already played with jq a little – maybe been given a jq command to run by someone else, found a useful one-liner from StackOverflow, or hacked something together quickly that ‘does the job’, but without really understanding what you did. While that’s a great way to get going, a guided course that shows you how the pieces fit together by using it really helps you go further. Understanding these pieces enable you to more creative, solving your own challenges in ways that work for your problem domain.
The ‘Hard Way’ is a method that emphasises the process required to learn anything. You don’t learn to ride a bike by reading about it, and you don’t learn to cook by reading recipes. Content can help (hopefully, this does) but it’s up to you to do the work.
This book shows you the path in small digestible pieces and tells you to actually type out the code. This is as important as riding a bike is to learning to ride a bike. Without the brain and the body working together, the knowledge does not properly seep in.
Before we get hands on with jq, it’s important to know what JSON is, and what it is not.
In this post, we cover:
This is an introductory post, but an important one.
Even if you’ve seen JSON before, I strongly encourage you to read over this. The reason for that is that getting a clear grasp of the terminology will help enormously when reading jq docs later. In fact, a good working understanding of the terminology is the best way to avoid confusion when using jq.
JSON is a ‘data interchange format’. This is a fancy way of saying that is a standardised way to write information and send it to other entities who can understand what it means.
You might have heard of (or even used) other data-interchange formats, such as XML, CSV, Apache Parquet, YAML. Each of these formats has their benefits and disadvantages relative to each other. CSV is very simple and easily understood but is not very good at expressing complex nested information, and can be ambiguous in how it represents data. XML allows for very complex data to be encapsulated but can be verbose and hard for humans to parse. YAML is optimised for human readability, allowing comments and using whitespace rather than special characters to delimit.
JSON is ubiquitous for a few reasons. First, it is simple, being easily parsed by anyone familiar with standard programming languages. Second, it is natively understood by JavaScript, a very popular programming language in the IT industry. Third, it is widely parsed by many programming languages in easily available libraries.
Here is an example JSON object.
{
"accounting": [
{
"firstName": "Alice",
"lastName": "Zebra",
"building": "7a",
"age": 19
},
{
"firstName": "Bob",
"lastName": "Young",
"age": 28
}
],
"sales": [
{
"firstName": "Celia",
"lastName": "Xi",
"building": "Jefferson",
"age": 37
},
{
"firstName": "Jim",
"lastName": "Galley",
"age": 46
}
]
}
The above JSON represents two departments of a workplace and their employees. The departments are in a ‘collection’ of name-value pairs. "accounting"
and "sales"
are the names, and the values are an ordered list of name-value pairs (an ordered list is known as an array).
Anything enclosed within a pair of curly braces (‘{
‘ and ‘}
‘) is an object. Anything enclosed within a pair of square braces (‘[
‘ and ‘]
‘) is an array.
It might sound theoretical, but it’s really important that you understand the above terminology, or at least understand that it’s important. Most jq
documentation makes these distinctions carefully, and some use them wrongly, or loosely. This can cause great confusion. When you look at JSON as you read this book, be sure you can explain what it is in clear and correct terms to yourself and others.
The format is flexible, allowing items within an object to have different name-value pairs. Here, the “building” name is in Celia’s and Alice’s entry, but not in Jim’s or Bob’s.
A JSON document can be an object or an array. Here is the same document as above, but in an array rather than an object.
[
{
"accounting": [
{
"firstName": "Alice",
"lastName": "Zebra",
"building": "7a",
"age": 19
},
{
"firstName": "Bob",
"lastName": "Young",
"age": 28
}
]
},
{
"sales": [
{
"firstName": "Celia",
"lastName": "Xi",
"building": "Jefferson",
"age": 37
},
{
"firstName": "Jim",
"lastName": "Galley",
"age": 46
}
]
}
]
In this document, the departments are in a specific order, because they are placed in an array rather than in an object.
In the above passage, the key terms to grasp are:
We will cover these in more depth later in this series, but for now just be aware that these names exist, and that understanding them is key to getting to mastery of jq
.
JSON arose from Javascript’s need for a way to communicate between processes on different hosts in an agreed format. It was established as a standard around the turn of the century, and any Javascript interpreter now understands JSON out of the box.
JSON is not specific to JavaScript. It was invented for JavaScript, but is now a general-purpose format that is well-supported by many languages.
Here is an example of an interactive Python session parsing a simplified version of the above JSON into a Python dictionary.
$ python3
>>> json_str = '{"sales": [{"name": "Alice"}], "accounting": [{"name": "Bob"}]}'
>>> import json
>>> json_parsed = json.loads(json_str)
{'sales': [{'name': 'Alice'}], 'accounting': [{'name': 'Bob'}]}
>>> type(json_parsed)
<class 'dict'>
>>> >>> json_parsed['sales']
[{'name': 'Alice'}]
>>> json_parsed['sales'][0]
{'name': 'Alice'}
>>> json_parsed['sales'][0]['name']
'Alice'
Many engineers today make extensive use of YAML as a configuration language. JSON and YAML express very similar document content, but they look different. YAML is easier for humans to read than JSON, and also allows for comments in its documents.
Technically, JSON can be converted into YAML without any loss of information. But this conversion cannot always go both ways. YAML has a few extra features, such as ‘anchors’ that allow you to reference other items within the same document, which can make converting back to JSON impossible.
JSON can have a nested structure. This means that any value within a JSON object or array can have the same structure as the whole document. In other words, every value could itself be a JSON document. So each of the the following lines are valid JSON documents:
{}
"A string"
{ "A name" : {} }
{ "A name" : [] }
and this one is not valid:
{ {} }
because there is no ‘value’ inside the JSON object.
This one is also not valid:
{ Thing }
because values that are strings need to be quoted (just as in JavaScript).
We will go into more detail on name-value pairs in an upcoming post.
1) Read the page https://www.json.org/json-en.html
2) Pick a programming language of your choice and parse a JSON document into it