forked from ClickHouse/clickhouse-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
116 changed files
with
5,456 additions
and
1,295 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
position: 1 | ||
label: 'Cloud' | ||
label: 'Benefits' | ||
collapsible: true | ||
collapsed: false | ||
collapsed: true | ||
link: | ||
type: doc | ||
id: en/cloud/index |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
label: 'Cloud Reference' | ||
position: 1 | ||
collapsible: true | ||
collapsed: true | ||
link: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
position: 1 | ||
label: 'Concepts' | ||
link: | ||
type: generated-index | ||
slug: /en/cconcepts |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
sidebar_position: 1 | ||
sidebar_label: Creating Tables | ||
--- | ||
|
||
# Creating Tables in ClickHouse | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
sidebar_position: 2 | ||
sidebar_label: Inserting Data | ||
--- | ||
|
||
# Inserting Data into ClickHouse | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
sidebar_position: 2 | ||
sidebar_label: What is OLAP? | ||
description: "OLAP stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business." | ||
--- | ||
|
||
# What is OLAP? | ||
|
||
<!-- slug: /en/faq/general/olap --> | ||
|
||
|
||
[OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. But at the very high level, you can just read these words backward: | ||
|
||
Processing | ||
: Some source data is processed… | ||
|
||
Analytical | ||
: …to produce some analytical reports and insights… | ||
|
||
Online | ||
: …in real-time. | ||
|
||
## OLAP from the Business Perspective {#olap-from-the-business-perspective} | ||
|
||
In recent years, business people started to realize the value of data. Companies who make their decisions blindly, more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be remotely useful for making business decisions and need mechanisms to timely analyze them. Here’s where OLAP database management systems (DBMS) come in. | ||
|
||
In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence). | ||
|
||
ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and an in-house data warehouse scenario is also viable. | ||
|
||
## OLAP from the Technical Perspective {#olap-from-the-technical-perspective} | ||
|
||
All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data. | ||
|
||
In practice OLAP and OLTP are not categories, it’s more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system. | ||
|
||
Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../faq/general/why-clickhouse-is-so-fast.md) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added. | ||
|
||
The fundamental trade-off between OLAP and OLTP systems remains: | ||
|
||
- To build analytical reports efficiently it’s crucial to be able to read columns separately, thus most OLAP databases are [columnar](../../faq/general/columnar-database.md), | ||
- While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
--- | ||
sidebar_position: 1 | ||
sidebar_label: Why is ClickHouse so Fast? | ||
description: "It was designed to be fast. Query execution performance has always been a top priority during the development process, but other important characteristics like user-friendliness, scalability, and security were also considered so ClickHouse could become a real production system." | ||
--- | ||
|
||
# Why is ClickHouse so fast? {#why-clickhouse-is-so-fast} | ||
|
||
<!-- slug: /en/faq/general/why-clickhouse-is-so-fast --> | ||
|
||
It was designed to be fast. Query execution performance has always been a top priority during the development process, but other important characteristics like user-friendliness, scalability, and security were also considered so ClickHouse could become a real production system. | ||
|
||
### "Building for Fast", Alexey Milovidov (CTO, ClickHouse) | ||
|
||
<iframe width="675" height="380" src="https://www.youtube.com/embed/CAS2otEoerM" frameborder="0" allow="accelerometer; autoplay; gyroscope; picture-in-picture" allowfullscreen></iframe> | ||
|
||
["Building for Fast"](https://www.youtube.com/watch?v=CAS2otEoerM) talk from ClickHouse Meetup Amsterdam, June 2022. | ||
|
||
["Secrets of ClickHouse Performance Optimizations"](https://www.youtube.com/watch?v=ZOZQCQEtrz8) talk from Big Data Technology Conference, December 2019, offers a more technical take on the same topic. | ||
|
||
## What Makes ClickHouse so Fast? | ||
|
||
### Architecture choices | ||
|
||
ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. That’s what needs to be done to build a typical analytical report, and that’s what a typical [GROUP BY](../../sql-reference/statements/select/group-by.md) query does. The ClickHouse team has made several high-level decisions that, when combined, made achieving this task possible: | ||
|
||
**Column-oriented storage:** Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns to avoid expensive disk read operations. | ||
|
||
**Indexes:** Memory resident ClickHouse data structures allow the reading of only the necessary columns, and only the necessary row ranges of those columns. | ||
|
||
**Data compression:** Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data a column often has the same, or not so many different, values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create/table.md/#specialized-codecs) that can make data even more compact. | ||
|
||
**Vectorized query execution:** ClickHouse not only stores data in columns but also processes data in columns. This leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage. | ||
|
||
**Scalability:** ClickHouse can leverage all available CPU cores and disks to execute even a single query. Not only on a single server but all CPU cores and disks of a cluster as well. | ||
|
||
### Attention to Low-Level Details | ||
|
||
But many other database management systems use similar techniques. What really makes ClickHouse stand out is **attention to low-level details**. Most programming languages provide implementations for most common algorithms and data structures, but they tend to be too generic to be effective. Every task can be considered as a landscape with various characteristics, instead of just throwing in random implementation. For example, if you need a hash table, here are some key questions to consider: | ||
|
||
- Which hash function to choose? | ||
- Collision resolution algorithm: [open addressing](https://en.wikipedia.org/wiki/Open_addressing) vs [chaining](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining)? | ||
- Memory layout: one array for keys and values or separate arrays? Will it store small or large values? | ||
- Fill factor: when and how to resize? How to move values around on resize? | ||
- Will values be removed and which algorithm will work better if they will? | ||
- Will we need fast probing with bitmaps, inline placement of string keys, support for non-movable values, prefetch, and batching? | ||
|
||
Hash table is a key data structure for `GROUP BY` implementation and ClickHouse automatically chooses one of [30+ variations](https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Aggregator.h) for each specific query. | ||
|
||
The same goes for algorithms, for example, in sorting you might consider: | ||
|
||
- What will be sorted: an array of numbers, tuples, strings, or structures? | ||
- Is all data available completely in RAM? | ||
- Do we need a stable sort? | ||
- Do we need a full sort? Maybe partial sort or n-th element will suffice? | ||
- How to implement comparisons? | ||
- Are we sorting data that has already been partially sorted? | ||
|
||
Algorithms that they rely on characteristics of data they are working with can often do better than their generic counterparts. If it is not really known in advance, the system can try various implementations and choose the one that works best in runtime. For example, see an [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/). | ||
|
||
Last but not least, the ClickHouse team always monitors the Internet on people claiming that they came up with the best implementation, algorithm, or data structure to do something and tries it out. Those claims mostly appear to be false, but from time to time you’ll indeed find a gem. | ||
|
||
:::info Tips for building your own high-performance software | ||
- Keep in mind low-level details when designing your system. | ||
- Design based on hardware capabilities. | ||
- Choose data structures and abstractions based on the needs of the task. | ||
- Provide specializations for special cases. | ||
- Try new, "best" algorithms, that you read about yesterday. | ||
- Choose an algorithm in runtime based on statistics. | ||
- Benchmark on real datasets. | ||
- Test for performance regressions in CI. | ||
- Measure and observe everything. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
sidebar_position: 3 | ||
sidebar_label: Writing Queries | ||
--- | ||
|
||
# Writing Queries in ClickHouse | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Client APIs | ||
|
||
There are many ways to integrate with ClickHouse. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
sidebar_position: 0.1 | ||
slug: /en/faq | ||
--- | ||
|
||
# Frequently Asked Questions | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
sidebar_label: Introduction | ||
sidebar_position: 1 | ||
keywords: [clickhouse, go, client, golang] | ||
slug: /en/integrations/go/intro | ||
description: The Go clients for ClickHouse allows users to connect to ClickHouse using either the Go standard database/sql interface or an optimized native interface. | ||
--- | ||
|
||
# ClickHouse Go Client | ||
|
||
ClickHouse supports two official Go clients. These clients are complementary and intentionally support different use cases. | ||
|
||
* [clickhouse-go](https://github.com/ClickHouse/clickhouse-go) - High level language client which supports either the Go standard database/sql interface or the native interface. | ||
* [ch-go](https://github.com/ClickHouse/ch-go) - Low level client. Native interface only. | ||
|
||
clickhouse-go provides a high-level interface, allowing users to query and insert data using row-orientated semantics and batching that are lenient with respect to data types - values will be converted provided no precision loss is potentially incurred. ch-go, meanwhile, provides an optimized column-orientated interface that provides fast data block streaming with low CPU and memory overhead at the expense of type strictness and more complex usage. | ||
|
||
From version 2.3, Clickhouse-go utilizes ch-go for low-level functions such as encoding, decoding, and compression. Note that clickhouse-go also supports the Go `database/sql` interface standard. Both clients use the native format for their encoding to provide optimal performance and can communicate over the native ClickHouse protocol. clickhouse-go also supports HTTP as its transport mechanism for cases where users have a requirement to proxy or load balance traffic. | ||
|
||
When choosing a client library, users should be aware of their respective pros and cons - see Choosing a Client Library. | ||
|
||
<div class="adopters-table"> | ||
|
||
| | Native format | Native protocol | HTTP protocol | Row Orientated API | Column Orientated API | Type flexibility | Compression | Query Placeholders | | ||
|:-------------:|:-------------:|:---------------:|:-------------:|:------------------:|:---------------------:|:----------------:|:-----------:|:------------------:| | ||
| clickhouse-go | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| ch-go | ✅ | ✅ | | | ✅ | | ✅ | | | ||
|
||
</div> |
Oops, something went wrong.