Skip to content

Latest commit

 

History

History
1127 lines (856 loc) · 41.3 KB

README.md

File metadata and controls

1127 lines (856 loc) · 41.3 KB

$ jb name=json.bash creates=JSON

json.bash is a command-line tool and bash library that creates JSON.

$ jb name=json.bash creates=JSON dependencies:[,]=Bash,Grep
{"name":"json.bash","creates":"JSON","dependencies":["Bash","Grep"]}

$ # Values are strings unless explicitly typed
$ jb id=42 size:number=42 surname=null data:null
{"id":"42","size":42,"surname":"null","data":null}

$ # Reference variables with @name
$ id=42 date=2023-06-23 jb @id created@date modified@date
{"id":"42","created":"2023-06-23","modified":"2023-06-23"}

$ # Pull data from files
$ printf hunter2 > /tmp/password; jb @/tmp/password
{"password":"hunter2"}

$ # Pull data from shell pipelines
$ jb sizes:number[]@<(seq 1 4)
{"sizes":[1,2,3,4]}

$ # Nest jb calls
$ jb type=club members:json[]@<(jb name=Bob; jb name=Alice)
{"type":"club","members":[{"name":"Bob"},{"name":"Alice"}]}

$ # The Bash API can reference arrays and create JSON efficiently — without forking
$ source json.bash
$ out=people json name=Bob; out=people json name=Alice; sizes=(42 91 2)
$ id="abc.123" json @id @sizes:number[] @people:json[]
{"id":"abc.123","sizes":[42,91,2],"people":[{"name":"Bob"},{"name":"Alice"}]}

json.bash's one thing is to get shell-native data (environment variables, files, program output) to somewhere else, using JSON encapsulate it robustly.

Creating JSON from the command line or a shell script can be useful when:

  • You need some ad-hoc JSON to interact with a JSON-consuming application
  • You need to bundle up some data to share or move elsewhere. JSON can a good alternative to base64-encoding or a file archive.

It does no transformation or filtering itself, instead it pulls data from things you already know how to use, like files, command-line arguments, environment variables, shell pipelines and shell scripts. It glues together data from these sources, giving it enough structure to make the data easy to consume reliably in downstream programs.

It's something like a reverse tee — it pulls together data sources, using JSON to represent the aggregation. It's not an alternative to to data-processing tools like jq, rather it helps assemble JSON to send into JSON-consuming tools like jq.


Contents

  1. Install
  2. How-to guides
  3. Background & performance notes
  4. Credits

Install

Container image

We publish the container image ghcr.io/h4l/json.bash/jb with jb-* and json.bash, perhaps useful to try without installing.

$ docker container run --rm ghcr.io/h4l/json.bash/jb msg=Hi
{"msg":"Hi"}

$ # Get a bash shell to try things interactively
$ docker container run --rm -it ghcr.io/h4l/json.bash/jb
bash-5.2# jb os-release:{}@<(xargs < /etc/os-release env -i)
{"os-release":{"NAME":"Alpine Linux","ID":"alpine","VERSION_ID":"3.18.2","PRETTY_NAME":"Alpine Linux v3.18","HOME_URL":"https://alpinelinux.org/","BUG_REPORT_URL":"https://gitlab.alpinelinux.org/alpine/aports/-/issues"}}

OS Packages

Expand this for details…

Auto-generated packages

Externally-maintained packages are not yet widely available, so we provide a way to generate a package file for any package manager supported by fpm (at least apk, deb, freebsd, rpm, sh (self extracting), tar, possibly more).

We publish the container image ghcr.io/h4l/json.bash/pkg that can generate a package file in whichever format you like:

$ docker container run --rm -v "$(pwd):/pkg" ghcr.io/h4l/json.bash/pkg deb
Generating: /pkg/json.bash_0.2.2-dev.deb

$ ls
json.bash_0.2.2-dev.deb
$ dpkg -i /pkg/json.bash_0.2.2-dev.deb

Externally-maintained packages

These packages have been created and maintained by downstream users/projects.

OS Repository Details
Arch linux User Repository https://aur.archlinux.org/packages/json-bash-git
json.bash needs you!

If you'd like json.bash to be available from your operating system's package manager, you can help by investigating your package manager's process for adding new packages to see if you can contribute a package, or make a request to add it.

Manual install

Installing manually is quite straightforward.

Expand this for instructions…
# Alternatively, use /usr/local/bin to install system-wide
cd ~/.local/bin
curl -fsSL -O "https://raw.githubusercontent.com/h4l/json.bash/HEAD/json.bash"
chmod +x json.bash
ln -s json.bash jb
ln -s json.bash jb-array

# If your shell is bash, you can alias jb and jb-array to the bash functions for
# better performance. You should add this line to your ~/.bashrc
source json.bash; alias jb=json jb-array=json.array

# Optional: if you'd also like jb-echo, jb-cat, jb-stream
for name in jb-echo jb-cat jb-stream; do
  curl -fsSL -O "https://raw.githubusercontent.com/h4l/json.bash/HEAD/bin/${name:?}"
  chmod +x "${name:?}"
done

To uninstall, remove json.bash, jb, jb-array, jb-echo, jb-cat and jb-stream from the directory you installed them to (run which -a json.bash to find where it is).

How-to guides

  1. The json.bash commands
  2. Object keys
  3. Object values
  4. Arrays (mixed types, fixed length)
  5. Argument types
  6. Array values (uniform types, variable length)
  7. Object values (uniform types, variable length)
  8. ... arguments (merge entries into the host object/array)
  9. Missing / empty values
  10. Nested JSON with :json and :raw types
  11. File references
  12. Argument structure
  13. Error handling
  14. Security and correctness
  15. jb-cat, jb-echo, jb-stream utility programs
  16. Streaming output
  17. Configuring environment variables

These examples mostly use jb, which is the json.bash library run as a stand-alone program. From within a bash script you get better performance by running source json.bash and using the json bash function, which is a superset of stand-alone jb and much faster because it doesn't execute new child processes when called. See the Background & performance notes section for more.

The json.bash commands

jb / jb-array / json / json.array

$ # The jb program creates JSON objects
$ jb
{}

$ # The jb-array creates arrays, but otherwise works like jb.
$ jb-array :number=4
[4]

$ # From a bash shell or bash script, use the json and json.array functions
$ source json.bash  # no path is needed if json.bash is on $PATH
$ json
{}

$ # json.array creates arrays, but otherwise works like json
$ json.array
[]

Each argument defines an entry in the object or array. Arguments can contain a key, type and value in this structure:

A railroad syntax diagram showing a high-level summary of the key, type and value structure of an argument.

The Argument structure section has more details.

Object keys

Each argument creates an entry in the JSON object. The first part of each argument defines the key.

$ jb msg=hi
{"msg":"hi"}

$ # Keys can contain most characters (except @:=, unless escaped)
$ jb "🐚"=JSON
{"🐚":"JSON"}

$ # Key values can come from variables
$ key="The Message" jb @key=hi
{"The Message":"hi"}

$ # Key variables can contain any characters
$ key="@key:with=reserved-chars" jb @key=hi
{"@key:with=reserved-chars":"hi"}

$ # Each argument defines a key
$ var=c jb a=X b=Y @var=Z
{"a":"X","b":"Y","c":"Z"}

$ # Keys may be reused, but should not be, because JSON parser behaviour for
$ # duplicate keys is undefined.
$ jb a=A a=B a=C
{"a":"A","a":"B","a":"C"}

$ # The reserved characters can be escaped by doubling them
$ jb =@@handle=ok a::z=ok 1+1==2=ok
{"@handle":"ok","a:z":"ok","1+1=2":"ok"}

Object values

The last part of each argument after a = or @ defines the value. Values can contain their value directly, or reference a variable or file.

$ jb message="Hello World"
{"message":"Hello World"}

$ greeting="Hi there" jb message@greeting
{"message":"Hi there"}

$ # Variable references without a value define the key and value in one go.
$ greeting="Hi" name=Bob jb @greeting @name
{"greeting":"Hi","name":"Bob"}

$ # This also applies (less usefully) to inline entries.
$ jb message
{"message":"message"}

$ # Inline values following a `=` have no content restrictions.
$ jb message=@value:with=reserved-chars
{"message":"@value:with=reserved-chars"}

$ # @ values that begin with / or ./ are references to files
$ printf hunter2 > /tmp/password; jb secret@/tmp/password
{"secret":"hunter2"}

$ # File references without a value define the key and value in one go.
$ jb @/tmp/password
{"password":"hunter2"}

File references are more powerful than they might first appear, as they enable all sorts of dynamic content to be pulled into JSON data, including nested jb calls. See File references.

Arrays (mixed types, fixed length)

Creating arrays is much like creating objects — arguments hold values, either directly, or referencing variables or files.

$ jb-array Hi "Bob Bobson"
["Hi","Bob Bobson"]

$ message=Hi name="Bob Bobson" jb-array @message @name
["Hi","Bob Bobson"]

$ printf 'Bob Bobson' > /tmp/name
$ jb-array Hi @/tmp/name
["Hi","Bob Bobson"]

$ # Array values in arguments cannot contain @:= characters (unless escaped by
$ # doubling them), because they would clash with @variable and :type syntax.
$ # However, values following a = can contain anything, so long as they follow a
$ # key or type section.
$ jb-array :='@foo:bar=baz' :='{"not":"parsed"}' =@@es::cap==ed
["@foo:bar=baz","{\"not\":\"parsed\"}","@es:cap=ed"]

$ # Values from variables have no restrictions. Arrays use the same argument
$ # syntax as objects, so values in the key or value position work the same.
$ s1='@foo:bar=baz' s2='{"not":"parsed"}' jb-array @s1: :@s2
["@foo:bar=baz","{\"not\":\"parsed\"}"]

$ # It's possible to set a key as well as value for array entries, but the key
$ # is ignored.
$ a=A b=B jb-array @a@a @b=B c=C
["A","B","C"]

jb-array is best for creating tuple-like arrays with a fixed number of entries with a mix of types. Use value arrays to create variable-length arrays containing the same type.

json.array is the Bash API equivalent of jb-array.

Argument types

Values are strings unless explicitly typed.

$ # These arguments are strings because they don't use a type
$ jb data=42 surname=null favourite_word=true
{"data":"42","surname":"null","favourite_word":"true"}

$ # Non-string values need explicit types
$ jb size:number=42
{"size":42}

$ # true/false/null have types which don't require redundant values
$ jb active:true enabled:false data:null
{"active":true,"enabled":false,"data":null}

$ # Regardless, they can be given values if desired
$ jb active:true=true enabled:false=false data:null=null
{"active":true,"enabled":false,"data":null}

$ # The bool type allows either true or false values.
$ active=true jb @active:bool enabled:bool=false
{"active":true,"enabled":false}

$ # The auto type outputs true/false/null and number values.
$ jb a:auto=42 b:auto=Hi c:auto=true d:auto=false e:auto=null f:auto=[] g:auto={}
{"a":42,"b":"Hi","c":true,"d":false,"e":null,"f":"[]","g":"{}"}

$ # auto can be used selectively like other types
$ data=42 jb a=42 b:auto=42 c:auto@data
{"a":"42","b":42,"c":42}

$ # In the Bash API (but not yet the jb CLI), the default type can be changed
$ # using the json_defaults option. First you create a named defaults set:
$ source json.bash
$ json.define_defaults num :number

$ # Then use the name with json_defaults when calling json to use your defaults
$ json_defaults=num json data=42
{"data":42}

$ # In which case strings need to be explicitly typed
$ json_defaults=num json data=42 msg=Hi
json.encode_number(): not all inputs are numbers: 'Hi'
json(): Could not encode the value of argument 'msg=Hi' as a 'number' value. Read from inline value.
�␘

$ json_defaults=num json data=42 msg:string=Hi
{"data":42,"msg":"Hi"}
Why does json.bash require explicit types?

Why does json.bash require explicit types?

Type coercion can look good in demos, but my opinion is that in practice, fields are more commonly of a specific type than a union of several options, so coercing types by default makes it harder to achieve correct behaviour in the common case. The Norway Problem is worth reading about if you're not familiar with it.

Regardless, you can make the :auto type the default by using json_defaults when calling json from the Bash API (as demonstrated above). (This isn't yet exposed through the jb CLI.)

Array values (uniform types, variable length)

Arrays of values can be created using [] after the : type marker.

$ jb sizes:number[]=42
{"sizes":[42]}

$ # The value is split on the character inside the []
$ jb names:[,]="Alice,Bob,Dr Chris"
{"names":["Alice","Bob","Dr Chris"]}

$ # Using a newline \n as the split character makes each line an array
$ # element. This integrates with line-oriented command-line tools:
$ jb sizes:number[$'\n']="$(seq 3)"
{"sizes":[1,2,3]}

File references with process substitution are the a better way to get the output of other programs into JSON though.

$ # The default type is used if the type name is left out
$ jb sizes:[,]="1,2,3"
{"sizes":["1","2","3"]}

$ # [:] is shorthand for /collection=array,split=:/
$ jb names:/collection=array,split=:/="Alice:Bob:Dr Chris"
{"names":["Alice","Bob","Dr Chris"]}

$ # To split on null bytes, use split= (empty string). When used with inline and
$ # bash values this effectively inhibits splitting, because bash variables
$ # can't contain null bytes.
$ printf 'AB\nCD\x00EF\nGH\n\x00' | jb nullterm:[]/split=/@/dev/stdin
{"nullterm":["AB\nCD","EF\nGH\n"]}

$ # When using the Bash API, @var references can be bash arrays
$ source json.bash
$ names=("Bob Bobson" "Alice Alison") sizes=(42 55)
$ json @names:string[] @sizes:number[]
{"names":["Bob Bobson","Alice Alison"],"sizes":[42,55]}

$ # json.array values can be arrays too
$ json.array @names:string[] @sizes:number[] :null[] :bool[]=true
[["Bob Bobson","Alice Alison"],[42,55],[null],[true]]

$ # And jb-array values can be arrays as well
$ jb-array :[,]="Bob Bobson,Alice Alison" :number[,]=42,55 :null[] :bool[]=true
[["Bob Bobson","Alice Alison"],[42,55],[null],[true]]

Arrays can be created from existing JSON arrays using the [:json] array format:

$ jb tags:[:json]="$(jb-array foo bar baz)"
{"tags":["foo","bar","baz"]}

$ # The type of values in the argument's array must match the argument type
$ jb measures:number[:json]='[1,2,3,4]'
{"measures":[1,2,3,4]}

$ # Otherwise an error occurs
$ jb measures:number[:json]='[1,2,"oops"]'
json.encode_array_entries_from_json(): provided entries are not all valid JSON arrays with 'number' values — '[1,2,"oops"]'
json(): Could not encode the value of argument 'measures:number[:json]=[1,2,"oops"]' as an array with 'number' values. Read from inline value, without splitting (one chunk), interpreted chunks with 'json' format.
�␘

Object values (uniform types, variable length)

Variable-length JSON objects can be created using {} after the : type marker. Object values use the same key=value syntax used in arguments' attributes section (:/a=b,c=d/).

$ # The default type is used if the type name is left out
$ jb sizes:{}=small=s,medium=m,large=l
{"sizes":{"small":"s","medium":"m","large":"l"}}

$ jb measurements:number{}=small=5,medium=10,large=15
{"measurements":{"small":5,"medium":10,"large":15}}

Like array values ([]), object values consume multiple lines of input when reading files

$ # env is a command-line tool that prints environment variables
$ env -i small=s medium=m large=l
small=s
medium=m
large=l

$ # We can encode variables from env as a JSON object
$ env -i small=s medium=m large=l | jb sizes:{}@/dev/stdin
{"sizes":{"small":"s","medium":"m","large":"l"}}

As with array values, JSON data can be used as values:

$ jb user=h4l repo=json.bash >> info
$ jb @./info:{:json}
{"info":{"user":"h4l","repo":"json.bash"}}

$ jb file_types:string[,]=bash,md,hcl year_created:number=2023 >> info
$ # The values of the JSON objects are validated to match the argument's type,
$ # so the :json type must be used to consume arbitrary JSON
$ jb @./info:json{:json}
{"info":{"user":"h4l","repo":"json.bash","file_types":["bash","md","hcl"],"year_created":2023}}

... arguments (merge entries into the host object/array)

An argument prefixed with ... (commonly called splat, spread or unpacking in programming languages) results in the argument's entries being merged directly into the object or array being created.

$ jb id=ab12 ...:=user=h4l,repo=json.bash ...:number=year=2023,min_radish_count=3
{"id":"ab12","user":"h4l","repo":"json.bash","year":2023,"min_radish_count":3}

$ seq 5 8 | jb-array :number=0 ...:number[,]=1,2,3,4 ...:number@/dev/stdin
[0,1,2,3,4,5,6,7,8]

Missing / empty values

References to undefined variables, missing files or unreadable files are missing values. Empty array variables, empty string variables, empty files and empty argument values are empty values.

Missing or empty keys or values are errors by default, apart from empty argument values, like foo=.

The flags + ~ ? and ?? alter how missing/empty values behave.

Flag Name Effect
+ strict All missing/empty values are errors.
~ optional Missing files/variables are treated as empty.
? substitute empty Empty values are substituted with a default.
?? omit empty Entries with an empty key or value are omitted.
$ # empty argument values are substituted by default
$ jb str= num:number= bool:bool= arr:[]= obj:{}=
{"str":"","num":0,"bool":false,"arr":[],"obj":{}}

$ # Using ? substitutes the empty var for the default string, which is ""
$ empty= jb @empty?
{"empty":""}

$ # The empty attribute controls the default value. It's interpreted as JSON.
$ CI=true jb ci:bool/empty=false/?@CI
{"ci":true}

$ CI= jb ci:true/empty=false/?@CI
{"ci":false}

$ # empty_key controls the default value for empty keys
$ PROP= jb ?@PROP:true/empty_key='"🤷"'/
{"🤷":true}

$ # The type= can be used to encode a raw value as JSON for empty attributes
$ PROP=👌 jb ?@PROP:true/empty_key=string=🤷/
{"👌":true}

$ # ?? causes an empty value to be omitted entirely
$ CI= jb ci:bool??@CI
{}

$ # ~ causes a missing value to be empty. A ? is needed to prevent the empty
$ # value being an error.
$ jb github_actions:bool~?@GITHUB_ACTION
{"github_actions":false}

$ # Empty variables are errors if ? isn't used.
$ empty= jb @empty
json.apply_empty_action(): The value of argument '@empty' must be non-empty but is empty.
json(): Could not encode the value of argument '@empty' as a 'string' value. Read from variable $empty. (Use the '?' flag after the :type to substitute the entry's empty value with a default, or the '??' flag to omit the entry when it has an empty value.)
�␘

$ # (Only the json Bash function, not the jb executable can access bash array variables.)
$ . json.bash
$ empty_array=()

$ # Using ? substitutes the empty array for the default, which is []
$ json @empty_array:[]?
{"empty_array":[]}

$ # Empty arrays are errors without ?.
$ json @empty_array:[]
json.apply_empty_action(): The value of argument '@empty_array:[]' must be non-empty but is empty.
json(): Could not encode the value of argument '@empty_array:[]' as an array with 'string' values. Read from array-variable $empty_array. (Use the '?' flag after the :type to substitute the entry's empty value with a default, or the '??' flag to omit the entry when it has an empty value.)
�␘

$ # Missing / empty files work like variables
$ jb @./config:/empty=null/~?
{"config":null}

Nested JSON with :json and :raw types

Nested objects and arrays are created using the :json or :raw types. The :json type validates the provided value(s) and fails if they're not actually JSON, whereas the :raw type allow any value to be inserted (even invalid JSON).

The reason for both is that :json depends on grep (with PCRE) being present, so :raw can be used in situations where only bash is available, and validation isn't necessary (e.g. when passing the output of one json.bash call into another). :raw also supports streaming output, which :json does not.

$ # Like other types, :json and :raw values can be directly embedded in arguments
$ jb user:json='{"name":"Bob Bobson"}'
{"user":{"name":"Bob Bobson"}}

$ # Or come from variable references
$ user='{"name":"Bob Bobson"}' jb @user:json
{"user":{"name":"Bob Bobson"}}

$ # Or files
$ jb name="Bob Bobson" > /tmp/user; jb @/tmp/user:json
{"user":{"name":"Bob Bobson"}}

$ # Arrays of JSON work the same way as other types.
$ jb users:json[$'\n']="$(jb name=Bob; jb name=Alice)"
{"users":[{"name":"Bob"},{"name":"Alice"}]}

$ # :json and :raw values are not formatted — whitespace in them is preserved
$ jb user:json=$'{\n  "name": "Bob Bobson"\n}'
{"user":{
  "name": "Bob Bobson"
}}

$ # :json detects invalid JSON and fails with an error
$ jb oops:json='{"truncated":'
json.encode_json(): not all inputs are valid JSON: '{"truncated":'
json(): Could not encode the value of argument 'oops:json={"truncated":' as a 'json' value. Read from inline value.
�␘

$ # However :raw performs no validation, so it must only be used with great care
$ # 🚨 This emits invalid JSON without failing! 🚨
$ jb broken:raw='{"truncated":'
{"broken":{"truncated":}

File references

The @ref syntax can be used to reference the content of files. If an @ref starts with / or ./ it's taken to be a file (rather than a shell variable).

$ printf 'orange #3\nblue #5\n' > colours

$ jb my_colours@./colours
{"my_colours":"orange #3\nblue #5\n"}

$ # The final path segment is used as the key if a key isn't set.
$ jb @./colours
{"colours":"orange #3\nblue #5\n"}

$ # Array values split on newlines
$ jb @./colours:[]
{"colours":["orange #3","blue #5"]}

$ printf 'apple:pear:grape' > fruit

$ # The file can be split on a different character by naming it in the []
$ jb @./fruit:[:]
{"fruit":["apple","pear","grape"]}

$ # Which is shorthand for
$ jb @./fruit:/collection=array,split=:/
{"fruit":["apple","pear","grape"]}

$ # Split on null by setting split to the empty string
$ printf 'foo\nbar\n\x00bar baz\n\x00' > nullterminated
$ jb @./nullterminated:[]/split=/
{"nullterminated":["foo\nbar\n","bar baz\n"]}

$ # Read from stdin using the special /dev/stdin file
$ seq 3 | jb counts:number[]@/dev/stdin
{"counts":[1,2,3]}

File references become especially powerful when combined with process substitution — a shell feature that provides a dynamic, temporary file containing the output of another program.

$ # Use process substitution to nest jb calls and pull multiple shell pipelines
$ # into one JSON output.
$ jb counts:number[]@<(seq 3) \
>    people:json[]@<(jb name=Bob; jb name=Alice)
{"counts":[1,2,3],"people":[{"name":"Bob"},{"name":"Alice"}]}

Aside: Process substitution 101

$ # What's going on when we use process substitution? The <(...) syntax.
$ jb msg@<(printf "Hi!")
{"msg":"Hi!"}

$ # The shell replaces <(...) with a file path. That file contains the output of
$ # the command inside the <(...) when read. (But the catch is, the file only
$ # exists while the command runs, and it's not a normal file, so the contents
$ # isn't stored on disk.)

$ # We can see this if we echo the the substitution:
$ echo This is the substitution result: <(printf "Hi!")
This is the substitution result: /dev/fd/...

$ # If we cat the substitution instead of echoing it, we read the file contents:
$ cat <(printf "Hi!")
Hi!

$ # So when we use this with jb, it's as if we ran:  jb msg@/dev/fd/...

$ # We can see this in action by enabling tracing in Bash:
$ set -o xtrace;  jb msg@<(printf "Hi!");  set +o xtrace
+ jb msg@/dev/fd/...
++ printf 'Hi!'
{"msg":"Hi!"}
+ set +o xtrace

Because <(...) becomes a path, you don't have to quote it, which makes forming commands a bit easier than using command substitution to do the same thing (echo "$(printf like this)"). And you only pass a short file path as an argument, not a potentially huge string.

Back to file references

$ # Process substitution can nest multiple times
$ jb owners:json@<(
>   jb people:json[]@<(jb name=Bob; jb name=Alice)
> )
{"owners":{"people":[{"name":"Bob"},{"name":"Alice"}]}}

$ # Files can be referenced indirectly using a shell variable.
$ # If @var is used and $var is not set, but $var_FILE is, the filename is read
$ # from $var_FILE and the content of the file is used.
$ printf 'secret123' > db_password
$ db_password_FILE=./db_password jb @db_password
{"db_password":"secret123"}

(This pattern is often used to securely pass secrets via environment variables, without directly exposing the secret's value itself in the environment, to avoid accidental exposure.)

$ # Nesting lots of process substitution levels can become unwieldy, but we can
$ # flatten the nesting by holding the process substitution filenames in shell
$ # variables, using the _FILE var feature to reference them:
$ people_FILE=<(jb name=Bob; jb name=Alice) \
> owners_FILE=<(jb @people:json[]) \
> jb @owners:json
{"owners":{"people":[{"name":"Bob"},{"name":"Alice"}]}}

Argument structure

Arguments have 3 main parts: a key, type and value. The structure (omitting some details for clarity) is:

A railroad syntax diagram showing the key, type and value structure of an argument, in more detail than the minimal argument diagram, but still omitting some details.

The Argument syntax page has more detail.

Error handling

json.bash aims to fail quickly, cleanly and clearly when problems happen.

Please open an issue if you discover a case where an error goes unreported, is not reported clearly, or you find it's not easy to prevent incorrect data getting generated.

Invalid values in typed arguments will cause an error — values are not coerced if a type is specified. :bool and :null are pedantic — values must be exactly true / false / null.

$ active=tRuE jb @active:bool
json.encode_bool(): not all inputs are bools: 'tRuE'
json(): Could not encode the value of argument '@active:bool' as a 'bool' value. Read from variable $active.
�␘

Errors are reported with specific exit statuses:

$ # Errors in user-provided data fail with status 1
$ jb data:json='invalid'; echo status=$?
json.encode_json(): not all inputs are valid JSON: 'invalid'
json(): Could not encode the value of argument 'data:json=invalid' as a 'json' value. Read from inline value.
�␘
status=1

$ # Errors in developer-provided arguments fail with status 2
$ jb bad_arg:cheese; echo status=$?
json.parse_argument(): type name must be one of auto, bool, false, json, null, number, raw, string or true, but was 'cheese'
json(): Could not parse argument 'bad_arg:cheese'. Argument is not structured correctly, see --help for examples.
�␘
status=2

$ # Arguments referencing variables that don't exist fail with status 3
$ jb @missing; echo status=$?
json(): Could not process argument '@missing'. Its value references unbound variable $missing. (Use the '~' flag after the :type to treat a missing value as empty.)
�␘
status=3

$ # Arguments referencing files that don't exist fail with status 4
$ jb @/does/not/exist; echo status=$?
/.../bin/jb: line ...: /does/not/exist: No such file or directory
json(): Could not open the file '/does/not/exist' referenced as the value of argument '@/does/not/exist'.
�␘
status=4

jb can detect errors in upstream jb calls that are pulled into a downstream jb process, such as when several jb calls are fed into each other using process substitution.

$ # The jb call 3 levels deep reading the missing file ./not-found fails
$ jb club:json@<(
>   jb name="jb Users" members:json[]@<(
>     jb name=h4l; jb name@./not-found
>   )
> )
...: ./not-found: No such file or directory
json(): Could not open the file './not-found' referenced as the value of argument 'name@./not-found'.
json.encode_json(): not all inputs are valid JSON: '{"name":"h4l"}' $'\030'
json(): Could not encode the value of argument 'members:json[]@/dev/fd/...' as an array with 'json' values. Read from file /dev/fd/..., split into chunks on $'\n', interpreted chunks with 'raw' format.
json.encode_json(): not all inputs are valid JSON: $'\030'
json(): Could not encode the value of argument 'club:json@/dev/fd/...' as a 'json' value. Read from file /dev/fd/..., up to the first 0x00 byte or end-of-file.
�␘

Notice the symbol in the output? It's the Unicode symbol for the Cancel control character.

jb propagates errors by emitting a Cancel control character when it fails, which causes its output to be invalid JSON, which prevents the erroneous output from being parsed by downstream JSON-consuming programs (jb or otherwise). We call this Stream Poisoning, because the Cancel control character poisons the output, and this poisoned output flows downstream until it's detected.

The result of this is that it's safe to pipe the output of a jb program into another JSON-consuming program, with the knowledge that you'll get an error if something has failed upstream, without needing to meticulously collect and check the every exit status of every program contributing to the output.

See docs/stream-poisoning.md for more on how this works.

Security and correctness

jb can safely generate JSON from untrusted user input, but there are some ways to get this wrong.

It's safe to use untrusted input in:

  • Inline values — after the value's = in an argument.

    With argument key:type=value Anything after the value's = in an argument is used as-is and not interpreted/unescaped, so it can contain untrusted input.

    The = must be preceded by a key or : (type section marker), otherwise an argument starting with a = such as =foo is parsed as a key, which could allow text inserted into the argument to be parsed as the value if not escaped correctly.

  • Variable references — the value held in a variable referenced by an argument.

    With argument @foo, the value of $foo is is used as-is and not interpreted/unescaped, so it can contain untrusted input.

  • File references — the contents of a file referenced by a argument.

    The contents of files is not interpreted/unescaped, so they can contain untrusted input.

In general, avoid inserting user-provided input into the argument string passed to jb before the value's =. To create dynamic object property names from user input, store the user-provided value in a variable or file, and use an @ref to reference it:

$ dynamic_prop='Untrusted' jb @dynamic_prop=value
{"Untrusted":"value"}

If you format user-input into an argument string, they could insert an @ref of their choice, and pull in a file or variable they shouldn't have access to. You can escape special characters in argument values by doubling characters, but it's safer to use an @ref — if you get an @ref wrong you get an error, whereas if you get escaping wrong, you may create a vulnerability.

References are not supported when specifying argument attributes, like /empty=x/, so , in these values needs to be escaped by doubling it. E.g. to use a comma as a split char, use /empty=string='Why,, yes.'/:

$ empty= jb msg:/empty=string='Why,, yes.'/??@empty
{"msg":"Why, yes."}

To pass a dynamic file location, use a _FILE variable reference or read the file with a normal shell construct and redirect the input. You must also separately validate that the referenced file should be accessible.

$ printf 'Example\nContent\n' > /tmp/example
$ user_file=/tmp/example

$ user_specified_FILE=$user_file jb user_file_content@user_specified
{"user_file_content":"Example\nContent\n"}

$ jb user_file_content@<(cat "$user_file")
{"user_file_content":"Example\nContent\n"}

$ jb user_file_content="$(<"$user_file")"  # $() strips the trailing newline
{"user_file_content":"Example\nContent"}

Environment variable exposure

jb @var refs have the advantage over normal shell $var refs in that they are not expanded by the shell before executing the command, so sensitive values in shell variables are not exposed as process arguments when using @var:

$ password=hunter2

$ # shell $var — secret's value is in process arguments
$ jb password="$password" visible_args:[]/split=/@/proc/self/cmdline
{"password":"hunter2","visible_args":["bash","/.../bin/jb","password=hunter2","visible_args:[]/split=/@/proc/self/cmdline"]}

$ # jb @var — only the variable name is in process arguments
$ password=$password jb @password visible_args:[]/split=/@/proc/self/cmdline
{"password":"hunter2","visible_args":["bash","/.../bin/jb","@password","visible_args:[]/split=/@/proc/self/cmdline"]}

jb-cat, jb-echo, jb-stream utility programs

json.bash has a few single-purpose utility programs that were originally demo programs for the Bash API, but could be use useful by themselves:

$ # jb-echo is like echo, but each argument becomes a string element in a JSON array
$ jb-echo foo "bar baz" boz
["foo","bar baz","boz"]

$ printf 'The Cat\nsat on\nthe mat.\n' > catmat
$ printf 'The Bat\nhid in\nthe hat.\n' > bathat

$ # jb-cat is like cat, but the output is stream-encoded as a single JSON string
$ jb-cat catmat bathat
"The Cat\nsat on\nthe mat.\nThe Bat\nhid in\nthe hat.\n"

$ # jb-stream is a filter program that encodes each input line as a JSON string
$ cat catmat bathat | jb-stream
"The Cat"
"sat on"
"the mat."
"The Bat"
"hid in"
"the hat."

Streaming output

By default jb collects output in a buffer and outputs it all at once at the end. This has the advantage that it does not emit partial output if an error occurs mid-way through.

However, setting the JSON_BASH_STREAM=true makes jb output content incrementally. jb can stream-encode values it's pulling from file references:

  • Single string values from files are stream-encoded
  • Arrays of any type coming from files are stream-encoded (individual elements are fully buffered), but elements are emitted incrementally
  • :raw values from files are streamed

:json values can't be streamed unfortunately — jb (ab)uses grep to validate JSON using PCRE's recursive matching features, but sadly grep buffers complete inputs, even when backtracking and matched-region output are disabled.

Argument syntax details

The full syntax of jb arguments is documented in a (pseudo) grammar in hack/syntax_patterns.bash.

Configuring environment variables

json.bash can be configured by setting certain environment variables:

JSON_BASH_GREP

The command(s) to run to start the JSON validator grep co-process.

Default: ggrep:grep

json.bash uses a regular expression to validate JSON data. It uses a GNU grep co-process to validate lines of JSON data using this regex (specifically GNU grep, because it's a PCRE expression).

This environment variable contains a list of command names or absolute paths to execute. Multiple commands are separated with :. The first command that exists is used, so the list can contain missing commands, so long as one exists.

The default of ggrep:grep means ggrep will be used if it's available, otherwise grep.

Background & performance notes

Quite reasonably, you may be wondering why anyone would use Bash to implement a JSON encoder. Won't that be ridiculously slow? I thought so too. Initially, I just wanted to encode JSON strings from Bash without needing to depend on a separate program. My initial few attempts at this were indeed hideously slow. But after a few iterations I was able to get decent performance by operating only on entire strings (or arrays of strings) (not byte-by-byte, or string-by-string for arrays), and absolutely avoiding any forking of subshells.

If you don't fork, and minimise the number of Bash-level operations, Bash can do surprisingly well. Of course, performance still can't compare with a C program. Well, that depends what you're measuring. Because starting a new process can be surprisingly slow. So a race between json.bash and program like jq or jo is a bit like a 100m race between a tortoise and a hare, where the tortoise gets a 1 hour headstart.

If you care about latency rather than throughput, calling json from an already-running Bash script is a little faster than running a separate jo process. And significantly faster than running jq, which is really slow to start.

There's a very basic benchmark script at hack/hot_loop.bash:

$ time hack/hot_loop.bash json.bash 10000 > /dev/null
json.bash

real    0m8.193s
user    0m8.174s
sys     0m0.019s

$ time hack/hot_loop.bash jo 10000 > /dev/null
jo

real    0m9.393s
user    0m2.566s
sys     0m7.386s

$ # Note: 1000 not 10_000
$ time hack/hot_loop.bash jq 1000 > /dev/null
jq

real    0m20.453s
user    0m19.127s
sys     0m1.386s

If we just use json.bash's json.encode_string encoding function to manually construct the JSON (not the full argument parsing stuff) we can do a lot better still:

$ time hack/hot_loop.bash custom-json.bash 10000 > /dev/null
custom-json.bash

real    0m1.901s
user    0m1.891s
sys     0m0.011s

This kind of purpose-specific encoding is what I had in mind when I started this. I was calling jq lots of times from a Bash script, finding it to be very slow, and wondering if I could start a single jq process and make a kind of tiny RPC protocol, sending it JSON from the Bash script to avoid the startup delay on each operation. That would require some ability to encode JSON from Bash.

I wasn't planning to write something comparable to jo when I started, but idea of a jo-like program that only depends on bash kind of appealed to me. Maybe I should port it to a more suitable language though. The program is a now a lot larger in size and scope than I originally anticipated when starting, I certainly wouldn't have written it in bash if I'd known how large it'd become. 🙃

Credits

  • jo for the general idea of a command-line program that generates JSON
  • tesh which automatically runs and tests the command-line output examples here — it would not be at all practical to maintain these kind of examples without it. With tesh the examples become a beneficial second layer of tests, rather than a maintenance burdon.
  • jq for making it pleasant to use JSON on the command-line and in shell scripts