Merge remote-tracking branch 'ClickHouse/master' into patch-3

den-crane · Jan 16, 2025 · 29fdd2e · 29fdd2e
2 parents cf96789 + 5efa76f
commit 29fdd2e
Show file tree

Hide file tree

Showing 41 changed files with 1,206 additions and 629 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -47,29 +47,14 @@ At a minimum, the following information should be added (but add more as needed)
 
 All builds in Builds_1 and Builds_2 stages are always mandatory
 and will run independently of the checks below:
-
-- [ ] <!---ci_set_required--> Allow: All Required Checks
-- [ ] <!---ci_include_stateless--> Allow: Stateless tests
-- [ ] <!---ci_include_stateful--> Allow: Stateful tests
-- [ ] <!---ci_include_integration--> Allow: Integration Tests
-- [ ] <!---ci_include_performance--> Allow: Performance tests
-- [ ] <!---ci_set_builds--> Allow: All Builds
-- [ ] <!---batch_0_1--> Allow: batch 1, 2 for multi-batch jobs
-- [ ] <!---batch_2_3--> Allow: batch 3, 4, 5, 6 for multi-batch jobs
----
-- [ ] <!---ci_exclude_style--> Exclude: Style check
-- [ ] <!---ci_exclude_fast--> Exclude: Fast test
-- [ ] <!---ci_exclude_asan--> Exclude: All with ASAN
-- [ ] <!---ci_exclude_tsan|msan|ubsan|coverage--> Exclude: All with TSAN, MSAN, UBSAN, Coverage
-- [ ] <!---ci_exclude_aarch64|release|debug--> Exclude: All with aarch64
-- [ ] <!---ci_exclude_release--> Exclude: All with release
-- [ ] <!---ci_exclude_debug--> Exclude: All with debug
+- [ ] <!---ci_include_stateless--> Only: Stateless tests
+- [ ] <!---ci_include_stateful--> Only: Stateful tests
+- [ ] <!---ci_include_integration--> Only: Integration tests
+- [ ] <!---ci_include_performance--> Only: Performance tests
 ---
-- [ ] <!---ci_include_uzz--> Run only fuzzers related jobs (libFuzzer fuzzers, AST fuzzers, BuzzHouse, etc.)
-- [ ] <!---ci_exclude_ast--> Exclude: AST fuzzers
+- [ ] <!---ci_exclude_style--> Skip: Style check
+- [ ] <!---ci_exclude_fast--> Skip: Fast test
 ---
-- [ ] <!---do_not_test--> Do not test
-- [ ] <!---woolen_wolfdog--> Woolen Wolfdog
-- [ ] <!---upload_all--> Upload binaries for special builds
-- [ ] <!---no_merge_commit--> Disable merge-commit
+- [ ] <!---woolen_wolfdog--> Non-blocking CI mode (Resource-intensive. All test jobs execute in parallel).
+- [ ] <!---no_merge_commit--> Disable merge-commit (Run CI on branch HEAD instead of merge commit with target branch)
 - [ ] <!---no_ci_cache--> Disable CI cache
diff --git a/contrib/SimSIMD b/contrib/SimSIMD
diff --git a/contrib/usearch b/contrib/usearch
diff --git a/docs/en/engines/table-engines/mergetree-family/annindexes.md b/docs/en/engines/table-engines/mergetree-family/annindexes.md
@@ -71,8 +71,8 @@ ORDER BY id;
 USearch indexes are currently experimental, to use them you first need to `SET allow_experimental_vector_similarity_index = 1`.
 :::
 
-The index can be build on a column of type [Array(Float64)](../../../sql-reference/data-types/array.md),
-[Array(Float32)](../../../sql-reference/data-types/array.md), or [Array(BFloat16)](../../../sql-reference/data-types/array.md).
+The index can be build on columns of type [Array(Float64)](../../../sql-reference/data-types/array.md) or
+[Array(Float32)](../../../sql-reference/data-types/array.md).
 
 Index parameters:
 - `method`: Currently only `hnsw` is supported.

diff --git a/docs/en/interfaces/cli.md b/docs/en/interfaces/cli.md
diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md
diff --git a/docs/en/interfaces/formats/CSV/CSV.md b/docs/en/interfaces/formats/CSV/CSV.md
diff --git a/docs/en/interfaces/formats/Template/Template.md b/docs/en/interfaces/formats/Template/Template.md
@@ -2,72 +2,142 @@
 title : Template
 slug : /en/interfaces/formats/Template
 keywords : [Template]
+input_format: true
+output_format: true
+alias: []
 ---
 
+| Input | Output | Alias |
+|-------|--------|-------|
+| ✔     | ✔      |       |
+
 ## Description
 
-This format allows specifying a custom format string with placeholders for values with a specified escaping rule.
+For cases where you need more customization than other standard formats offer, 
+the `Template` format allows the user to specify their own custom format string with placeholders for values,
+and specifying escaping rules for the data.
+
+It uses the following settings:
+
+| Setting                                                                                                  | Description                                                                                                                |
+|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
+| [`format_template_row`](#format_template_row)                                                            | Specifies the path to the file which contains format strings for rows.                                                     |
+| [`format_template_resultset`](#format_template_resultset)                                                | Specifies the path to the file which contains format strings for rows                                                      |
+| [`format_template_rows_between_delimiter`](#format_template_rows_between_delimiter)                      | Specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default) |
+| `format_template_row_format`                                                                             | Specifies the format string for rows [in-line](#inline_specification).                                                     |                                                                           
+| `format_template_resultset_format`                                                                       | Specifies the result set format string [in-line](#inline_specification).                                                   |
+| Some settings of other formats (e.g.`output_format_json_quote_64bit_integers` when using `JSON` escaping |                                                                                                                            |
 
-It uses settings `format_template_resultset`, `format_template_row` (`format_template_row_format`), `format_template_rows_between_delimiter` and some settings of other formats (e.g. `output_format_json_quote_64bit_integers` when using `JSON` escaping, see further)
+## Settings And Escaping Rules
 
-Setting `format_template_row` specifies the path to the file containing format strings for rows with the following syntax:
+### format_template_row
 
-`delimiter_1${column_1:serializeAs_1}delimiter_2${column_2:serializeAs_2} ... delimiter_N`,
+The setting `format_template_row` specifies the path to the file which contains format strings for rows with the following syntax:
 
-where `delimiter_i` is a delimiter between values (`$` symbol can be escaped as `$$`),
-`column_i` is a name or index of a column whose values are to be selected or inserted (if empty, then column will be skipped),
-`serializeAs_i` is an escaping rule for the column values. The following escaping rules are supported:
+```
+delimiter_1${column_1:serializeAs_1}delimiter_2${column_2:serializeAs_2} ... delimiter_N
+```
 
-- `CSV`, `JSON`, `XML` (similar to the formats of the same names)
-- `Escaped` (similar to `TSV`)
-- `Quoted` (similar to `Values`)
-- `Raw` (without escaping, similar to `TSVRaw`)
-- `None` (no escaping rule, see further)
+Where:
 
+| Part of syntax | Description                                                                                                       |
+|----------------|-------------------------------------------------------------------------------------------------------------------|
+| `delimiter_i`  | A delimiter between values (`$` symbol can be escaped as `$$`)                                                    |
+| `column_i`     | The name or index of a column whose values are to be selected or inserted (if empty, then the column will be skipped) |
+|`serializeAs_i` | An escaping rule for the column values.                                                                           |
+
+The following escaping rules are supported:
+
+| Escaping Rule        | Description                              |
+|----------------------|------------------------------------------|
+| `CSV`, `JSON`, `XML` | Similar to the formats of the same names |
+| `Escaped`            | Similar to `TSV`                         |
+| `Quoted`             | Similar to `Values`                      |
+| `Raw`                | Without escaping, similar to `TSVRaw`    |   
+| `None`               | No escaping rule - see note below        |
+
+:::note
 If an escaping rule is omitted, then `None` will be used. `XML` is suitable only for output.
+:::
+
+Let's look at an example. Given the following format string:
 
-So, for the following format string:
+```
+Search phrase: ${s:Quoted}, count: ${c:Escaped}, ad price: $$${p:JSON};
+```
+
+The following values will be printed (if using `SELECT`) or expected (if using `INPUT`), 
+between columns `Search phrase:`, `, count:`, `, ad price: $` and `;` delimiters respectively:
 
-      `Search phrase: ${SearchPhrase:Quoted}, count: ${c:Escaped}, ad price: $$${price:JSON};`
+- `s` (with escape rule `Quoted`)
+- `c` (with escape rule `Escaped`)
+- `p` (with escape rule `JSON`)
 
-the values of `SearchPhrase`, `c` and `price` columns, which are escaped as `Quoted`, `Escaped` and `JSON` will be printed (for select) or will be expected (for insert) between `Search phrase:`, `, count:`, `, ad price: $` and `;` delimiters respectively. For example:
+For example:
 
-`Search phrase: 'bathroom interior design', count: 2166, ad price: $3;`
+- If `INSERT`ing, the line below matches the expected template and would read values `bathroom interior design`, `2166`, `$3` into columns `Search phrase`, `count`, `ad price`.
+- If `SELECT`ing the line below is the output, assuming that values `bathroom interior design`, `2166`, `$3` are already stored in a table under columns `Search phrase`, `count`, `ad price`.  
 
-In cases where it is challenging or not possible to deploy format output configuration for the template format to a directory on all nodes in a cluster, or if the format is trivial then `format_template_row_format` can be used to set the template string directly in the query, rather than a path to the file which contains it.
+```
+Search phrase: 'bathroom interior design', count: 2166, ad price: $3;
+```
 
-The `format_template_rows_between_delimiter` setting specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default)
+### format_template_rows_between_delimiter
 
-Setting `format_template_resultset` specifies the path to the file, which contains a format string for resultset. Setting `format_template_resultset_format` can be used to set the template string for the result set directly in the query itself. Format string for resultset has the same syntax as a format string for row and allows to specify a prefix, a suffix and a way to print some additional information. It contains the following placeholders instead of column names:
+The setting `format_template_rows_between_delimiter` setting specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default)
+
+### format_template_resultset
+
+The setting `format_template_resultset` specifies the path to the file, which contains a format string for the result set. 
+
+The format string for the result set has the same syntax as a format string for rows. 
+It allows for specifying a prefix, a suffix and a way to print some additional information and contains the following placeholders instead of column names:
 
 - `data` is the rows with data in `format_template_row` format, separated by `format_template_rows_between_delimiter`. This placeholder must be the first placeholder in the format string.
-- `totals` is the row with total values in `format_template_row` format (when using WITH TOTALS)
-- `min` is the row with minimum values in `format_template_row` format (when extremes are set to 1)
-- `max` is the row with maximum values in `format_template_row` format (when extremes are set to 1)
-- `rows` is the total number of output rows
+- `totals` is the row with total values in `format_template_row` format (when using WITH TOTALS).
+- `min` is the row with minimum values in `format_template_row` format (when extremes are set to 1).
+- `max` is the row with maximum values in `format_template_row` format (when extremes are set to 1).
+- `rows` is the total number of output rows.
 - `rows_before_limit` is the minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT. If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
-- `time` is the request execution time in seconds
-- `rows_read` is the number of rows has been read
-- `bytes_read` is the number of bytes (uncompressed) has been read
+- `time` is the request execution time in seconds.
+- `rows_read` is the number of rows has been read.
+- `bytes_read` is the number of bytes (uncompressed) has been read.
 
 The placeholders `data`, `totals`, `min` and `max` must not have escaping rule specified (or `None` must be specified explicitly). The remaining placeholders may have any escaping rule specified.
+
+:::note
 If the `format_template_resultset` setting is an empty string, `${data}` is used as the default value.
+:::
+
 For insert queries format allows skipping some columns or fields if prefix or suffix (see example).
 
+### In-line specification {#inline_specification}
+
+Often times it is challenging or not possible to deploy the format configurations
+(set by `format_template_row`, `format_template_resultset`) for the template format to a directory on all nodes in a cluster. 
+Furthermore, the format may be so trivial that it does not require being placed in a file.
+
+For these cases, `format_template_row_format` (for `format_template_row`) and `format_template_resultset_format` (for `format_template_resultset`) can be used to set the template string directly in the query, 
+rather than as a path to the file which contains it.
+
+:::note
+The rules for format strings and escape sequences are the same as those for:
+- [`format_template_row`](#format_template_row) when using `format_template_row_format`.
+- [`format_template_resultset`](#format_template_resultset) when using `format_template_resultset_format`.
+:::
+
 ## Example Usage
 
-### Selecting Data
+Let's look at two examples of how we can use the `Template` format, first for selecting data and then for inserting data.
 
-Select example:
+### Selecting Data
 
 ``` sql
 SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase ORDER BY c DESC LIMIT 5 FORMAT Template SETTINGS
 format_template_resultset = '/some/path/resultset.format', format_template_row = '/some/path/row.format', format_template_rows_between_delimiter = '\n    '
 ```
 
-`/some/path/resultset.format`:
-
-``` text
+```text title="/some/path/resultset.format"
 <!DOCTYPE HTML>
 <html> <head> <title>Search phrases</title> </head>
  <body>
@@ -83,15 +153,13 @@ format_template_resultset = '/some/path/resultset.format', format_template_row =
 </html>
 ```
 
-`/some/path/row.format`:
-
-``` text
+```text title="/some/path/row.format"
 <tr> <td>${0:XML}</td> <td>${1:XML}</td> </tr>
 ```
 
 Result:
 
-``` html
+```html
 <!DOCTYPE HTML>
 <html> <head> <title>Search phrases</title> </head>
  <body>
@@ -113,8 +181,6 @@ Result:
 
 ### Inserting Data
 
-Insert example:
-
 ``` text
 Some header
 Page views: 5, User id: 4324182021466249494, Useless field: hello, Duration: 146, Sign: -1
@@ -128,22 +194,16 @@ format_template_resultset = '/some/path/resultset.format', format_template_row =
 FORMAT Template
 ```
 
-`/some/path/resultset.format`:
-
-``` text
+```text title="/some/path/resultset.format"
 Some header\n${data}\nTotal rows: ${:CSV}\n
 ```
 
-`/some/path/row.format`:
-
-``` text
+```text title="/some/path/row.format"
 Page views: ${PageViews:CSV}, User id: ${UserID:CSV}, Useless field: ${:CSV}, Duration: ${Duration:CSV}, Sign: ${Sign:CSV}
 ```
 
 `PageViews`, `UserID`, `Duration` and `Sign` inside placeholders are names of columns in the table. Values after `Useless field` in rows and after `\nTotal rows:` in suffix will be ignored.
 All delimiters in the input data must be strictly equal to delimiters in specified format strings.
 
-## Format Settings
-
 
 
diff --git a/docs/en/interfaces/formats/Template/TemplateIgnoreSpaces.md b/docs/en/interfaces/formats/Template/TemplateIgnoreSpaces.md
@@ -2,31 +2,45 @@
 title : TemplateIgnoreSpaces
 slug : /en/interfaces/formats/TemplateIgnoreSpaces
 keywords : [TemplateIgnoreSpaces]
+input_format: true
+output_format: false
+alias: []
 ---
 
+| Input | Output | Alias |
+|-------|--------|-------|
+| ✔     | ✗      |       |
+
 ## Description
 
+Similar to [`Template`], but skips whitespace characters between delimiters and values in the input stream. 
+However, if format strings contain whitespace characters, these characters will be expected in the input stream. 
+Also allows specifying empty placeholders (`${}` or `${:None}`) to split some delimiter into separate parts to ignore spaces between them. 
+Such placeholders are used only for skipping whitespace characters.
+It’s possible to read `JSON` using this format if the values of columns have the same order in all rows.
+
+:::note
 This format is suitable only for input.
-Similar to `Template`, but skips whitespace characters between delimiters and values in the input stream. However, if format strings contain whitespace characters, these characters will be expected in the input stream. Also allows specifying empty placeholders (`${}` or `${:None}`) to split some delimiter into separate parts to ignore spaces between them. Such placeholders are used only for skipping whitespace characters.
-It’s possible to read `JSON` using this format if the values of columns have the same order in all rows. For example, the following request can be used for inserting data from its output example of format [JSON](/docs/en/interfaces/formats/JSON):
+:::
 
 ## Example Usage
 
-``` sql
-INSERT INTO table_name SETTINGS
-format_template_resultset = '/some/path/resultset.format', format_template_row = '/some/path/row.format', format_template_rows_between_delimiter = ','
+The following request can be used for inserting data from its output example of format [JSON](/docs/en/interfaces/formats/JSON):
+
+```sql
+INSERT INTO table_name 
+SETTINGS
+    format_template_resultset = '/some/path/resultset.format',
+    format_template_row = '/some/path/row.format',
+    format_template_rows_between_delimiter = ','
 FORMAT TemplateIgnoreSpaces
 ```
 
-`/some/path/resultset.format`:
-
-``` text
+```text title="/some/path/resultset.format"
 {${}"meta"${}:${:JSON},${}"data"${}:${}[${data}]${},${}"totals"${}:${:JSON},${}"extremes"${}:${:JSON},${}"rows"${}:${:JSON},${}"rows_before_limit_at_least"${}:${:JSON}${}}
 ```
 
-`/some/path/row.format`:
-
-``` text
+```text title="/some/path/row.format"
 {${}"SearchPhrase"${}:${}${phrase:JSON}${},${}"c"${}:${}${cnt:JSON}${}}
 ```
 

diff --git a/docs/en/operations/system-tables/settings.md b/docs/en/operations/system-tables/settings.md
@@ -9,7 +9,7 @@ Columns:
 
 - `name` ([String](../../sql-reference/data-types/string.md)) — Setting name.
 - `value` ([String](../../sql-reference/data-types/string.md)) — Setting value.
-- `changed` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Shows whether a setting is changed from its default value.
+- `changed` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Shows whether the setting was explicitly defined in the config or explicitly changed.
 - `description` ([String](../../sql-reference/data-types/string.md)) — Short setting description.
 - `min` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Minimum value of the setting, if any is set via [constraints](../../operations/settings/constraints-on-settings.md#constraints-on-settings). If the setting has no minimum value, contains [NULL](../../sql-reference/syntax.md#null-literal).
 - `max` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Maximum value of the setting, if any is set via [constraints](../../operations/settings/constraints-on-settings.md#constraints-on-settings). If the setting has no maximum value, contains [NULL](../../sql-reference/syntax.md#null-literal).
+5 −5		.github/workflows/prerelease.yml
+8 −8		.github/workflows/release.yml
+1 −1		CMakeLists.txt
+69 −12		CONTRIBUTING.md
+1 −1		Cargo.lock
+1 −1		Cargo.toml
+253 −40		README.md
+1 −1		VERSION
+114 −109		c/lib.c
+33 −17		include/simsimd/binary.h
+867 −129		include/simsimd/curved.h
+562 −569		include/simsimd/dot.h
+14 −9		include/simsimd/probability.h
+102 −46		include/simsimd/simsimd.h
+28 −18		include/simsimd/spatial.h
+24 −0		include/simsimd/types.h
+26 −15		javascript/fallback.ts
+1 −1		javascript/simsimd.ts
+10 −9		package-lock.json
+2 −2		package.json
+6 −6		pyproject.toml
+22 −29		python/lib.c
+218 −5		rust/lib.rs
+156 −97		scripts/bench.cxx
+14 −8		scripts/test.c
+158 −145		scripts/test.mjs
+71 −52		scripts/test.py
+4 −1		.vscode/settings.json
+1 −1		CITATION.cff
+13 −1		CMakeLists.txt
+1 −1		Cargo.lock
+1 −1		Cargo.toml
+3 −1		README.md
+1 −1		VERSION
+1 −1		conanfile.py
+1 −1		csharp/nuget/nuget-package.props
+1 −1		include/usearch/index.hpp
+2 −0		include/usearch/index_plugins.hpp
+1 −1		java/README.md
+1 −1		package-lock.json
+1 −1		package.json
+12 −1		python/lib.cpp
+26 −0		python/scripts/test_index.py
+1 −1		wasmer.toml