Hacker Public Radio   /     HPR4104: Introduction to jq - part 1

Description

Introduction This is the start of a short series about the JSON data format, and how the command-line tool jq can be used to process such data. The plan is to make an open series to which others may contribute their own experiences using this tool. The jq command is described on the GitHub page as follows: jq is a lightweight and flexible command-line JSON processor …and as: jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. The jq tool is controlled by a programming language (also referred to as jq), which is very powerful. This series will mainly deal with this. JSON (JavaScript Object Notation) To begin we will look at JSON itself. It is defined on the Wikipedia page thus: JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers. The syntax of JSON is defined by RFC 8259 and by ECMA-404. It is fairly simple in principle but has some complexity. JSON’s basic data types are (edited from the Wikipedia page): Number: a signed decimal number that may contain a fractional part and may use exponential E notation, but cannot include non-numbers. (NOTE: Unlike what I said in the audio, there are two values representing non-numbers: 'nan' and infinity: 'infinity'. String: a sequence of zero or more Unicode characters. Strings are delimited with double quotation marks and support a backslash escaping syntax. Boolean: either of the values true or false Array: an ordered list of zero or more elements, each of which may be of any type. Arrays use square bracket notation with comma-separated elements. Object: a collection of name–value pairs where the names (also called keys) are strings. Objects are delimited with curly brackets and use commas to separate each pair, while within each pair the colon ':' character separates the key or name from its value. null: an empty value, using the word null Examples These are the basic data types listed above (same order): 42 "HPR" true ["Hacker","Public","Radio"] { "firstname": "John", "lastname": "Doe" } null jq From the Wikipedia page: jq was created by Stephen Dolan, and released in October 2012. It was described as being “like sed for JSON data”. Support for regular expressions was added in jq version 1.5. Obtaining jq This tool is available in most of the Linux repositories. For example, on Debian and Debian-based releases you can install it with: sudo apt install jq See the download page for the definitive information about available versions. Manual for jq There is a detailed manual describing the use of the jq programming language that is used to filter JSON data. It can be found at https://jqlang.github.io/jq/manual/. The HPR statistics page This is a collection of statistics about HPR, in the form of JSON data. We will use this as a moderately detailed example in this episode. A link to this page may be found on the HPR Calendar page close to the foot of the page under the heading Workflow. The link to the JSON statistics is https://hub.hackerpublicradio.org/stats.json. If you click on this you should see the JSON data formatted for you by your browser. Different browsers represent this in different ways. You can also collect and display this data from the command line, using jq of course: $ curl -s https://hub.hackerpublicradio.org/stats.json | jq '.' | nl -w3 -s' ' 1 { 2 "stats_generated": 1712785509, 3 "age": { 4 "start": "2005-09-19T00:00:00Z", 5 "rename": "2007-12-31T00:00:00Z", 6 "since_start": { 7 "total_seconds": 585697507, 8 "years": 18, 9 "months": 6, 10 "days": 28 11 }, 12 "since_rename": { 13 "total_seconds": 513726307, 14 "years": 16, 15 "months": 3, 16 "days": 15 17 } 18 }, 19 "shows": { 20 "total": 4626, 21 "twat": 300, 22 "hpr": 4326, 23 "duration": 7462050, 24 "human_duration": "0 Years, 2 months, 27 days, 8 hours, 47 minutes and 30 seconds" 25 }, 26 "hosts": 356, 27 "slot": { 28 "next_free": 8, 29 "no_media": 0 30 }, 31 "workflow": { 32 "UPLOADED_TO_IA": "2", 33 "RESERVE_SHOW_SUBMITTED": "27" 34 }, 35 "queue": { 36 "number_future_hosts": 7, 37 "number_future_shows": 28, 38 "unprocessed_comments": 0, 39 "submitted_shows": 0, 40 "shows_in_workflow": 15, 41 "reserve": 27 42 } 43 } The curl utility is useful for collecting information from links like this. I have used the -s option to ensure it does not show information about the download process, since it does this by default. The output is piped to jq which displays the data in a “pretty printed” form by default, as you see. In this case I have given jq a minimal filter which causes what it receives to be printed. The filter is simply '.'. I have piped the formatted JSON through the nl command to get line numbers for reference. The JSON shown here consists of nested JSON objects. The first opening brace and the last at line 43 define the whole thing as a single object. Briefly, the object contains the following: a number called stats_generated (line 2) an object called age on lines 3-18; this object contains two strings and two objects an object called shows on lines 19-25 a number called hosts on line 26 an object called slot on lines 27-30 an object called workflow on lines 31-34 an object called queue on lines 35-42 We will look at ways to summarise and reformat such output in a later episode. Next episode I will look at some of the options to jq next time, though most of them will be revealed as they become relevant. I will also start looking at jq filters in that episode. Links JSON (JavaScript Object Notation): Wikipedia page about JSON Standards: RFC8259: The JavaScript Object Notation (JSON) Data Interchange Format ECMA-404: The JSON data interchange syntax jq: GitHub page Downloading jq The jq manual Wikipedia page about the jq programming language MrX’s show on using the HPR statistics in JSON: Modifying a Python script with some help from ChatGPT

Summary

Introduction This is the start of a short series about the JSON data format, and how the command-line tool jq can be used to process such data. The plan is to make an open series to which others may contribute their own experiences using this tool. The jq command is described on the GitHub page as follows: jq is a lightweight and flexible command-line JSON processor …and as: jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. The jq tool is controlled by a programming language (also referred to as jq), which is very powerful. This series will mainly deal with this. JSON (JavaScript Object Notation) To begin we will look at JSON itself. It is defined on the Wikipedia page thus: JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers. The syntax of JSON is defined by RFC 8259 and by ECMA-404. It is fairly simple in principle but has some complexity. JSON’s basic data types are (edited from the Wikipedia page): Number: a signed decimal number that may contain a fractional part and may use exponential E notation, but cannot include non-numbers. (NOTE: Unlike what I said in the audio, there are two values representing non-numbers: 'nan' and infinity: 'infinity'. String: a sequence of zero or more Unicode characters. Strings are delimited with double quotation marks and support a backslash escaping syntax. Boolean: either of the values true or false Array: an ordered list of zero or more elements, each of which may be of any type. Arrays use square bracket notation with comma-separated elements. Object: a collection of name–value pairs where the names (also called keys) are strings. Objects are delimited with curly brackets and use commas to separate each pair, while within each pair the colon ':' character separates the key or name from its value. null: an empty value, using the word null Examples These are the basic data types listed above (same order): 42 "HPR" true ["Hacker","Public","Radio"] { "firstname": "John", "lastname": "Doe" } null jq From the Wikipedia page: jq was created by Stephen Dolan, and released in October 2012. It was described as being “like sed for JSON data”. Support for regular expressions was added in jq version 1.5. Obtaining jq This tool is available in most of the Linux repositories. For example, on Debian and Debian-based releases you can install it with:

Subtitle
Duration
Publishing date
2024-04-25 00:00
Link
https://hackerpublicradio.org/eps/hpr4104/index.html
Contributors
  perloid.nospam@nospam.autistici.org (Dave Morriss)
author  
Enclosures
http://hackerpublicradio.org/eps/hpr4104.mp3
audio/mpeg