Skip to content

Commit

Permalink
Merge remote-tracking branch 'ClickHouse/master' into patch-3
Browse files Browse the repository at this point in the history
  • Loading branch information
rschu1ze committed Jan 16, 2025
2 parents cf96789 + 5efa76f commit 29fdd2e
Show file tree
Hide file tree
Showing 41 changed files with 1,206 additions and 629 deletions.
31 changes: 8 additions & 23 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,29 +47,14 @@ At a minimum, the following information should be added (but add more as needed)

All builds in Builds_1 and Builds_2 stages are always mandatory
and will run independently of the checks below:

- [ ] <!---ci_set_required--> Allow: All Required Checks
- [ ] <!---ci_include_stateless--> Allow: Stateless tests
- [ ] <!---ci_include_stateful--> Allow: Stateful tests
- [ ] <!---ci_include_integration--> Allow: Integration Tests
- [ ] <!---ci_include_performance--> Allow: Performance tests
- [ ] <!---ci_set_builds--> Allow: All Builds
- [ ] <!---batch_0_1--> Allow: batch 1, 2 for multi-batch jobs
- [ ] <!---batch_2_3--> Allow: batch 3, 4, 5, 6 for multi-batch jobs
---
- [ ] <!---ci_exclude_style--> Exclude: Style check
- [ ] <!---ci_exclude_fast--> Exclude: Fast test
- [ ] <!---ci_exclude_asan--> Exclude: All with ASAN
- [ ] <!---ci_exclude_tsan|msan|ubsan|coverage--> Exclude: All with TSAN, MSAN, UBSAN, Coverage
- [ ] <!---ci_exclude_aarch64|release|debug--> Exclude: All with aarch64
- [ ] <!---ci_exclude_release--> Exclude: All with release
- [ ] <!---ci_exclude_debug--> Exclude: All with debug
- [ ] <!---ci_include_stateless--> Only: Stateless tests
- [ ] <!---ci_include_stateful--> Only: Stateful tests
- [ ] <!---ci_include_integration--> Only: Integration tests
- [ ] <!---ci_include_performance--> Only: Performance tests
---
- [ ] <!---ci_include_uzz--> Run only fuzzers related jobs (libFuzzer fuzzers, AST fuzzers, BuzzHouse, etc.)
- [ ] <!---ci_exclude_ast--> Exclude: AST fuzzers
- [ ] <!---ci_exclude_style--> Skip: Style check
- [ ] <!---ci_exclude_fast--> Skip: Fast test
---
- [ ] <!---do_not_test--> Do not test
- [ ] <!---woolen_wolfdog--> Woolen Wolfdog
- [ ] <!---upload_all--> Upload binaries for special builds
- [ ] <!---no_merge_commit--> Disable merge-commit
- [ ] <!---woolen_wolfdog--> Non-blocking CI mode (Resource-intensive. All test jobs execute in parallel).
- [ ] <!---no_merge_commit--> Disable merge-commit (Run CI on branch HEAD instead of merge commit with target branch)
- [ ] <!---no_ci_cache--> Disable CI cache
4 changes: 2 additions & 2 deletions docs/en/engines/table-engines/mergetree-family/annindexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ ORDER BY id;
USearch indexes are currently experimental, to use them you first need to `SET allow_experimental_vector_similarity_index = 1`.
:::

The index can be build on a column of type [Array(Float64)](../../../sql-reference/data-types/array.md),
[Array(Float32)](../../../sql-reference/data-types/array.md), or [Array(BFloat16)](../../../sql-reference/data-types/array.md).
The index can be build on columns of type [Array(Float64)](../../../sql-reference/data-types/array.md) or
[Array(Float32)](../../../sql-reference/data-types/array.md).

Index parameters:
- `method`: Currently only `hnsw` is supported.
Expand Down
368 changes: 211 additions & 157 deletions docs/en/interfaces/cli.md

Large diffs are not rendered by default.

196 changes: 5 additions & 191 deletions docs/en/interfaces/formats.md

Large diffs are not rendered by default.

72 changes: 48 additions & 24 deletions docs/en/interfaces/formats/CSV/CSV.md

Large diffs are not rendered by default.

150 changes: 105 additions & 45 deletions docs/en/interfaces/formats/Template/Template.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,72 +2,142 @@
title : Template
slug : /en/interfaces/formats/Template
keywords : [Template]
input_format: true
output_format: true
alias: []
---

| Input | Output | Alias |
|-------|--------|-------|
||| |

## Description

This format allows specifying a custom format string with placeholders for values with a specified escaping rule.
For cases where you need more customization than other standard formats offer,
the `Template` format allows the user to specify their own custom format string with placeholders for values,
and specifying escaping rules for the data.

It uses the following settings:

| Setting | Description |
|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| [`format_template_row`](#format_template_row) | Specifies the path to the file which contains format strings for rows. |
| [`format_template_resultset`](#format_template_resultset) | Specifies the path to the file which contains format strings for rows |
| [`format_template_rows_between_delimiter`](#format_template_rows_between_delimiter) | Specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default) |
| `format_template_row_format` | Specifies the format string for rows [in-line](#inline_specification). |
| `format_template_resultset_format` | Specifies the result set format string [in-line](#inline_specification). |
| Some settings of other formats (e.g.`output_format_json_quote_64bit_integers` when using `JSON` escaping | |

It uses settings `format_template_resultset`, `format_template_row` (`format_template_row_format`), `format_template_rows_between_delimiter` and some settings of other formats (e.g. `output_format_json_quote_64bit_integers` when using `JSON` escaping, see further)
## Settings And Escaping Rules

Setting `format_template_row` specifies the path to the file containing format strings for rows with the following syntax:
### format_template_row

`delimiter_1${column_1:serializeAs_1}delimiter_2${column_2:serializeAs_2} ... delimiter_N`,
The setting `format_template_row` specifies the path to the file which contains format strings for rows with the following syntax:

where `delimiter_i` is a delimiter between values (`$` symbol can be escaped as `$$`),
`column_i` is a name or index of a column whose values are to be selected or inserted (if empty, then column will be skipped),
`serializeAs_i` is an escaping rule for the column values. The following escaping rules are supported:
```
delimiter_1${column_1:serializeAs_1}delimiter_2${column_2:serializeAs_2} ... delimiter_N
```

- `CSV`, `JSON`, `XML` (similar to the formats of the same names)
- `Escaped` (similar to `TSV`)
- `Quoted` (similar to `Values`)
- `Raw` (without escaping, similar to `TSVRaw`)
- `None` (no escaping rule, see further)
Where:

| Part of syntax | Description |
|----------------|-------------------------------------------------------------------------------------------------------------------|
| `delimiter_i` | A delimiter between values (`$` symbol can be escaped as `$$`) |
| `column_i` | The name or index of a column whose values are to be selected or inserted (if empty, then the column will be skipped) |
|`serializeAs_i` | An escaping rule for the column values. |

The following escaping rules are supported:

| Escaping Rule | Description |
|----------------------|------------------------------------------|
| `CSV`, `JSON`, `XML` | Similar to the formats of the same names |
| `Escaped` | Similar to `TSV` |
| `Quoted` | Similar to `Values` |
| `Raw` | Without escaping, similar to `TSVRaw` |
| `None` | No escaping rule - see note below |

:::note
If an escaping rule is omitted, then `None` will be used. `XML` is suitable only for output.
:::

Let's look at an example. Given the following format string:

So, for the following format string:
```
Search phrase: ${s:Quoted}, count: ${c:Escaped}, ad price: $$${p:JSON};
```

The following values will be printed (if using `SELECT`) or expected (if using `INPUT`),
between columns `Search phrase:`, `, count:`, `, ad price: $` and `;` delimiters respectively:

`Search phrase: ${SearchPhrase:Quoted}, count: ${c:Escaped}, ad price: $$${price:JSON};`
- `s` (with escape rule `Quoted`)
- `c` (with escape rule `Escaped`)
- `p` (with escape rule `JSON`)

the values of `SearchPhrase`, `c` and `price` columns, which are escaped as `Quoted`, `Escaped` and `JSON` will be printed (for select) or will be expected (for insert) between `Search phrase:`, `, count:`, `, ad price: $` and `;` delimiters respectively. For example:
For example:

`Search phrase: 'bathroom interior design', count: 2166, ad price: $3;`
- If `INSERT`ing, the line below matches the expected template and would read values `bathroom interior design`, `2166`, `$3` into columns `Search phrase`, `count`, `ad price`.
- If `SELECT`ing the line below is the output, assuming that values `bathroom interior design`, `2166`, `$3` are already stored in a table under columns `Search phrase`, `count`, `ad price`.

In cases where it is challenging or not possible to deploy format output configuration for the template format to a directory on all nodes in a cluster, or if the format is trivial then `format_template_row_format` can be used to set the template string directly in the query, rather than a path to the file which contains it.
```
Search phrase: 'bathroom interior design', count: 2166, ad price: $3;
```

The `format_template_rows_between_delimiter` setting specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default)
### format_template_rows_between_delimiter

Setting `format_template_resultset` specifies the path to the file, which contains a format string for resultset. Setting `format_template_resultset_format` can be used to set the template string for the result set directly in the query itself. Format string for resultset has the same syntax as a format string for row and allows to specify a prefix, a suffix and a way to print some additional information. It contains the following placeholders instead of column names:
The setting `format_template_rows_between_delimiter` setting specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default)

### format_template_resultset

The setting `format_template_resultset` specifies the path to the file, which contains a format string for the result set.

The format string for the result set has the same syntax as a format string for rows.
It allows for specifying a prefix, a suffix and a way to print some additional information and contains the following placeholders instead of column names:

- `data` is the rows with data in `format_template_row` format, separated by `format_template_rows_between_delimiter`. This placeholder must be the first placeholder in the format string.
- `totals` is the row with total values in `format_template_row` format (when using WITH TOTALS)
- `min` is the row with minimum values in `format_template_row` format (when extremes are set to 1)
- `max` is the row with maximum values in `format_template_row` format (when extremes are set to 1)
- `rows` is the total number of output rows
- `totals` is the row with total values in `format_template_row` format (when using WITH TOTALS).
- `min` is the row with minimum values in `format_template_row` format (when extremes are set to 1).
- `max` is the row with maximum values in `format_template_row` format (when extremes are set to 1).
- `rows` is the total number of output rows.
- `rows_before_limit` is the minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT. If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
- `time` is the request execution time in seconds
- `rows_read` is the number of rows has been read
- `bytes_read` is the number of bytes (uncompressed) has been read
- `time` is the request execution time in seconds.
- `rows_read` is the number of rows has been read.
- `bytes_read` is the number of bytes (uncompressed) has been read.

The placeholders `data`, `totals`, `min` and `max` must not have escaping rule specified (or `None` must be specified explicitly). The remaining placeholders may have any escaping rule specified.

:::note
If the `format_template_resultset` setting is an empty string, `${data}` is used as the default value.
:::

For insert queries format allows skipping some columns or fields if prefix or suffix (see example).

### In-line specification {#inline_specification}

Often times it is challenging or not possible to deploy the format configurations
(set by `format_template_row`, `format_template_resultset`) for the template format to a directory on all nodes in a cluster.
Furthermore, the format may be so trivial that it does not require being placed in a file.

For these cases, `format_template_row_format` (for `format_template_row`) and `format_template_resultset_format` (for `format_template_resultset`) can be used to set the template string directly in the query,
rather than as a path to the file which contains it.

:::note
The rules for format strings and escape sequences are the same as those for:
- [`format_template_row`](#format_template_row) when using `format_template_row_format`.
- [`format_template_resultset`](#format_template_resultset) when using `format_template_resultset_format`.
:::

## Example Usage

### Selecting Data
Let's look at two examples of how we can use the `Template` format, first for selecting data and then for inserting data.

Select example:
### Selecting Data

``` sql
SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase ORDER BY c DESC LIMIT 5 FORMAT Template SETTINGS
format_template_resultset = '/some/path/resultset.format', format_template_row = '/some/path/row.format', format_template_rows_between_delimiter = '\n '
```

`/some/path/resultset.format`:

``` text
```text title="/some/path/resultset.format"
<!DOCTYPE HTML>
<html> <head> <title>Search phrases</title> </head>
<body>
Expand All @@ -83,15 +153,13 @@ format_template_resultset = '/some/path/resultset.format', format_template_row =
</html>
```

`/some/path/row.format`:

``` text
```text title="/some/path/row.format"
<tr> <td>${0:XML}</td> <td>${1:XML}</td> </tr>
```

Result:

``` html
```html
<!DOCTYPE HTML>
<html> <head> <title>Search phrases</title> </head>
<body>
Expand All @@ -113,8 +181,6 @@ Result:

### Inserting Data

Insert example:

``` text
Some header
Page views: 5, User id: 4324182021466249494, Useless field: hello, Duration: 146, Sign: -1
Expand All @@ -128,22 +194,16 @@ format_template_resultset = '/some/path/resultset.format', format_template_row =
FORMAT Template
```

`/some/path/resultset.format`:

``` text
```text title="/some/path/resultset.format"
Some header\n${data}\nTotal rows: ${:CSV}\n
```

`/some/path/row.format`:

``` text
```text title="/some/path/row.format"
Page views: ${PageViews:CSV}, User id: ${UserID:CSV}, Useless field: ${:CSV}, Duration: ${Duration:CSV}, Sign: ${Sign:CSV}
```

`PageViews`, `UserID`, `Duration` and `Sign` inside placeholders are names of columns in the table. Values after `Useless field` in rows and after `\nTotal rows:` in suffix will be ignored.
All delimiters in the input data must be strictly equal to delimiters in specified format strings.

## Format Settings



36 changes: 25 additions & 11 deletions docs/en/interfaces/formats/Template/TemplateIgnoreSpaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,45 @@
title : TemplateIgnoreSpaces
slug : /en/interfaces/formats/TemplateIgnoreSpaces
keywords : [TemplateIgnoreSpaces]
input_format: true
output_format: false
alias: []
---

| Input | Output | Alias |
|-------|--------|-------|
||| |

## Description

Similar to [`Template`], but skips whitespace characters between delimiters and values in the input stream.
However, if format strings contain whitespace characters, these characters will be expected in the input stream.
Also allows specifying empty placeholders (`${}` or `${:None}`) to split some delimiter into separate parts to ignore spaces between them.
Such placeholders are used only for skipping whitespace characters.
It’s possible to read `JSON` using this format if the values of columns have the same order in all rows.

:::note
This format is suitable only for input.
Similar to `Template`, but skips whitespace characters between delimiters and values in the input stream. However, if format strings contain whitespace characters, these characters will be expected in the input stream. Also allows specifying empty placeholders (`${}` or `${:None}`) to split some delimiter into separate parts to ignore spaces between them. Such placeholders are used only for skipping whitespace characters.
It’s possible to read `JSON` using this format if the values of columns have the same order in all rows. For example, the following request can be used for inserting data from its output example of format [JSON](/docs/en/interfaces/formats/JSON):
:::

## Example Usage

``` sql
INSERT INTO table_name SETTINGS
format_template_resultset = '/some/path/resultset.format', format_template_row = '/some/path/row.format', format_template_rows_between_delimiter = ','
The following request can be used for inserting data from its output example of format [JSON](/docs/en/interfaces/formats/JSON):

```sql
INSERT INTO table_name
SETTINGS
format_template_resultset = '/some/path/resultset.format',
format_template_row = '/some/path/row.format',
format_template_rows_between_delimiter = ','
FORMAT TemplateIgnoreSpaces
```

`/some/path/resultset.format`:

``` text
```text title="/some/path/resultset.format"
{${}"meta"${}:${:JSON},${}"data"${}:${}[${data}]${},${}"totals"${}:${:JSON},${}"extremes"${}:${:JSON},${}"rows"${}:${:JSON},${}"rows_before_limit_at_least"${}:${:JSON}${}}
```

`/some/path/row.format`:

``` text
```text title="/some/path/row.format"
{${}"SearchPhrase"${}:${}${phrase:JSON}${},${}"c"${}:${}${cnt:JSON}${}}
```

Expand Down
2 changes: 1 addition & 1 deletion docs/en/operations/system-tables/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Columns:

- `name` ([String](../../sql-reference/data-types/string.md)) — Setting name.
- `value` ([String](../../sql-reference/data-types/string.md)) — Setting value.
- `changed` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Shows whether a setting is changed from its default value.
- `changed` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Shows whether the setting was explicitly defined in the config or explicitly changed.
- `description` ([String](../../sql-reference/data-types/string.md)) — Short setting description.
- `min` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Minimum value of the setting, if any is set via [constraints](../../operations/settings/constraints-on-settings.md#constraints-on-settings). If the setting has no minimum value, contains [NULL](../../sql-reference/syntax.md#null-literal).
- `max` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Maximum value of the setting, if any is set via [constraints](../../operations/settings/constraints-on-settings.md#constraints-on-settings). If the setting has no maximum value, contains [NULL](../../sql-reference/syntax.md#null-literal).
Expand Down
Loading

0 comments on commit 29fdd2e

Please sign in to comment.