Skip to content

Commit

Permalink
Initial coy of doc-preview
Browse files Browse the repository at this point in the history
  • Loading branch information
rfraposa committed Mar 10, 2023
1 parent 600eeea commit 111174b
Show file tree
Hide file tree
Showing 116 changed files with 5,456 additions and 1,295 deletions.
6 changes: 3 additions & 3 deletions docs/en/about-us/distinctive-features.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
slug: /en/about-us/distinctive-features
sidebar_label: Distinctive Features
sidebar_label: Why is ClickHouse unique?
sidebar_position: 50
description: Understand what makes ClickHouse stand apart from other database management systems
---

# Distinctive Features of ClickHouse
# Distinctive Features of ClickHouse

## True Column-Oriented Database Management System
## True Column-Oriented Database Management System

In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length “number” next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any “garbage”) even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.

Expand Down
2 changes: 1 addition & 1 deletion docs/en/cloud-index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: /en/cloud/overview
keywords: [AWS, Cloud, serverless]
title: Cloud
title: Benefits
hide_title: true
---
import Content from '@site/docs/en/about-us/cloud.md';
Expand Down
4 changes: 2 additions & 2 deletions docs/en/cloud/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
position: 1
label: 'Cloud'
label: 'Benefits'
collapsible: true
collapsed: false
collapsed: true
link:
type: doc
id: en/cloud/index
1 change: 1 addition & 0 deletions docs/en/cloud/reference/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
label: 'Cloud Reference'
position: 1
collapsible: true
collapsed: true
link:
Expand Down
5 changes: 5 additions & 0 deletions docs/en/concepts/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
position: 1
label: 'Concepts'
link:
type: generated-index
slug: /en/cconcepts
7 changes: 7 additions & 0 deletions docs/en/concepts/creating-tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 1
sidebar_label: Creating Tables
---

# Creating Tables in ClickHouse

7 changes: 7 additions & 0 deletions docs/en/concepts/inserting-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 2
sidebar_label: Inserting Data
---

# Inserting Data into ClickHouse

42 changes: 42 additions & 0 deletions docs/en/concepts/olap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
sidebar_position: 2
sidebar_label: What is OLAP?
description: "OLAP stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business."
---

# What is OLAP?

<!-- slug: /en/faq/general/olap -->


[OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. But at the very high level, you can just read these words backward:

Processing
: Some source data is processed…

Analytical
: …to produce some analytical reports and insights…

Online
: …in real-time.

## OLAP from the Business Perspective {#olap-from-the-business-perspective}

In recent years, business people started to realize the value of data. Companies who make their decisions blindly, more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be remotely useful for making business decisions and need mechanisms to timely analyze them. Here’s where OLAP database management systems (DBMS) come in.

In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence).

ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and an in-house data warehouse scenario is also viable.

## OLAP from the Technical Perspective {#olap-from-the-technical-perspective}

All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data.

In practice OLAP and OLTP are not categories, it’s more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.

Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../faq/general/why-clickhouse-is-so-fast.md) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.

The fundamental trade-off between OLAP and OLTP systems remains:

- To build analytical reports efficiently it’s crucial to be able to read columns separately, thus most OLAP databases are [columnar](../../faq/general/columnar-database.md),
- While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows.
72 changes: 72 additions & 0 deletions docs/en/concepts/why-clickhouse-is-so-fast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
sidebar_position: 1
sidebar_label: Why is ClickHouse so Fast?
description: "It was designed to be fast. Query execution performance has always been a top priority during the development process, but other important characteristics like user-friendliness, scalability, and security were also considered so ClickHouse could become a real production system."
---

# Why is ClickHouse so fast? {#why-clickhouse-is-so-fast}

<!-- slug: /en/faq/general/why-clickhouse-is-so-fast -->

It was designed to be fast. Query execution performance has always been a top priority during the development process, but other important characteristics like user-friendliness, scalability, and security were also considered so ClickHouse could become a real production system.

### "Building for Fast", Alexey Milovidov (CTO, ClickHouse)

<iframe width="675" height="380" src="https://www.youtube.com/embed/CAS2otEoerM" frameborder="0" allow="accelerometer; autoplay; gyroscope; picture-in-picture" allowfullscreen></iframe>

["Building for Fast"](https://www.youtube.com/watch?v=CAS2otEoerM) talk from ClickHouse Meetup Amsterdam, June 2022.

["Secrets of ClickHouse Performance Optimizations"](https://www.youtube.com/watch?v=ZOZQCQEtrz8) talk from Big Data Technology Conference, December 2019, offers a more technical take on the same topic.

## What Makes ClickHouse so Fast?

### Architecture choices

ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. That’s what needs to be done to build a typical analytical report, and that’s what a typical [GROUP BY](../../sql-reference/statements/select/group-by.md) query does. The ClickHouse team has made several high-level decisions that, when combined, made achieving this task possible:

**Column-oriented storage:** Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns to avoid expensive disk read operations.

**Indexes:** Memory resident ClickHouse data structures allow the reading of only the necessary columns, and only the necessary row ranges of those columns.

**Data compression:** Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data a column often has the same, or not so many different, values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create/table.md/#specialized-codecs) that can make data even more compact.

**Vectorized query execution:** ClickHouse not only stores data in columns but also processes data in columns. This leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage.

**Scalability:** ClickHouse can leverage all available CPU cores and disks to execute even a single query. Not only on a single server but all CPU cores and disks of a cluster as well.

### Attention to Low-Level Details

But many other database management systems use similar techniques. What really makes ClickHouse stand out is **attention to low-level details**. Most programming languages provide implementations for most common algorithms and data structures, but they tend to be too generic to be effective. Every task can be considered as a landscape with various characteristics, instead of just throwing in random implementation. For example, if you need a hash table, here are some key questions to consider:

- Which hash function to choose?
- Collision resolution algorithm: [open addressing](https://en.wikipedia.org/wiki/Open_addressing) vs [chaining](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining)?
- Memory layout: one array for keys and values or separate arrays? Will it store small or large values?
- Fill factor: when and how to resize? How to move values around on resize?
- Will values be removed and which algorithm will work better if they will?
- Will we need fast probing with bitmaps, inline placement of string keys, support for non-movable values, prefetch, and batching?

Hash table is a key data structure for `GROUP BY` implementation and ClickHouse automatically chooses one of [30+ variations](https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Aggregator.h) for each specific query.

The same goes for algorithms, for example, in sorting you might consider:

- What will be sorted: an array of numbers, tuples, strings, or structures?
- Is all data available completely in RAM?
- Do we need a stable sort?
- Do we need a full sort? Maybe partial sort or n-th element will suffice?
- How to implement comparisons?
- Are we sorting data that has already been partially sorted?

Algorithms that they rely on characteristics of data they are working with can often do better than their generic counterparts. If it is not really known in advance, the system can try various implementations and choose the one that works best in runtime. For example, see an [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/).

Last but not least, the ClickHouse team always monitors the Internet on people claiming that they came up with the best implementation, algorithm, or data structure to do something and tries it out. Those claims mostly appear to be false, but from time to time you’ll indeed find a gem.

:::info Tips for building your own high-performance software
- Keep in mind low-level details when designing your system.
- Design based on hardware capabilities.
- Choose data structures and abstractions based on the needs of the task.
- Provide specializations for special cases.
- Try new, "best" algorithms, that you read about yesterday.
- Choose an algorithm in runtime based on statistics.
- Benchmark on real datasets.
- Test for performance regressions in CI.
- Measure and observe everything.
8 changes: 8 additions & 0 deletions docs/en/concepts/writing-queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
sidebar_position: 3
sidebar_label: Writing Queries
---

# Writing Queries in ClickHouse


3 changes: 3 additions & 0 deletions docs/en/coverpages/client-apis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Client APIs

There are many ways to integrate with ClickHouse.
7 changes: 7 additions & 0 deletions docs/en/coverpages/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 0.1
slug: /en/faq
---

# Frequently Asked Questions

29 changes: 29 additions & 0 deletions docs/en/coverpages/go.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
sidebar_label: Introduction
sidebar_position: 1
keywords: [clickhouse, go, client, golang]
slug: /en/integrations/go/intro
description: The Go clients for ClickHouse allows users to connect to ClickHouse using either the Go standard database/sql interface or an optimized native interface.
---

# ClickHouse Go Client

ClickHouse supports two official Go clients. These clients are complementary and intentionally support different use cases.

* [clickhouse-go](https://github.com/ClickHouse/clickhouse-go) - High level language client which supports either the Go standard database/sql interface or the native interface.
* [ch-go](https://github.com/ClickHouse/ch-go) - Low level client. Native interface only.

clickhouse-go provides a high-level interface, allowing users to query and insert data using row-orientated semantics and batching that are lenient with respect to data types - values will be converted provided no precision loss is potentially incurred. ch-go, meanwhile, provides an optimized column-orientated interface that provides fast data block streaming with low CPU and memory overhead at the expense of type strictness and more complex usage.

From version 2.3, Clickhouse-go utilizes ch-go for low-level functions such as encoding, decoding, and compression. Note that clickhouse-go also supports the Go `database/sql` interface standard. Both clients use the native format for their encoding to provide optimal performance and can communicate over the native ClickHouse protocol. clickhouse-go also supports HTTP as its transport mechanism for cases where users have a requirement to proxy or load balance traffic.

When choosing a client library, users should be aware of their respective pros and cons - see Choosing a Client Library.

<div class="adopters-table">

| | Native format | Native protocol | HTTP protocol | Row Orientated API | Column Orientated API | Type flexibility | Compression | Query Placeholders |
|:-------------:|:-------------:|:---------------:|:-------------:|:------------------:|:---------------------:|:----------------:|:-----------:|:------------------:|
| clickhouse-go |||||||||
| ch-go ||| | || || |

</div>
Loading

0 comments on commit 111174b

Please sign in to comment.