-
Notifications
You must be signed in to change notification settings - Fork 77
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
initial commit of JSON guidelines document
- Loading branch information
1 parent
70362e6
commit 1363c42
Showing
1 changed file
with
23 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
## Best Practices for JSON Output | ||
|
||
|
||
|
||
This note provides guidance for developers of code that generates JSON output, with the goals of producing JSON that works well with Parquet and `jq`. | ||
|
||
#### Principles | ||
|
||
- All names and strings must be valid UTF-8 with JSON characters escaped. | ||
- Data from packets is not trusted to be in the correct format. | ||
- No spaces or dashes in names. | ||
- Prefer lowercase. | ||
- There should be no empty JSON objects | ||
- For compressibility, highly variable fields (e.g. IP.ID) should be at the tail end of a record, not the front. | ||
- Avoid using network data as JSON keys, so that keys are consistent (and thus parquet-friendly) and follow the other guidelines. | ||
- There should be no empty JSON arrays (if semantically necessary, exceptions can be made if we pre-deploy the json2parquet schema). | ||
|
||
#### Resources | ||
|
||
The class utf8_safe_string | ||
https://wwwin-github.cisco.com/network-intelligence/mercury-transition/blob/dev/src/libmerc/utf8.hpp#L931 | ||
can be used to safely convert packet data into a string that can be | ||
used as e.g. a JSON array or object name. |