Skip to content

Commit

Permalink
Merge pull request ClickHouse#525 from ClickHouse/DanRoscigno-patch-7
Browse files Browse the repository at this point in the history
Update why-clickhouse-is-so-fast.md
  • Loading branch information
DanRoscigno authored Nov 8, 2022
2 parents c95bfcc + 1879620 commit d8bd043
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions docs/en/faq/general/why-clickhouse-is-so-fast.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ It was designed to be fast. Query execution performance has always been a top pr

### Architecture choices

ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. That’s what needs to be done to build a typical analytical report and that’s what a typical [GROUP BY](../../sql-reference/statements/select/group-by.md) query does. ClickHouse team has made several high-level decisions that combined made achieving this task possible:
ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. That’s what needs to be done to build a typical analytical report, and that’s what a typical [GROUP BY](../../sql-reference/statements/select/group-by.md) query does. The ClickHouse team has made several high-level decisions that, when combined, made achieving this task possible:

**Column-oriented storage:** Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns, or most expensive disk read operations would be wasted.
**Column-oriented storage:** Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns to avoid expensive disk read operations.

**Indexes:** ClickHouse keeps data structures in memory that allows reading not only used columns but only necessary row ranges of those columns.
**Indexes:** Memory resident ClickHouse data structures allow the reading of only the necessary columns, and only the necessary row ranges of those columns.

**Data compression:** Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data column often has the same or not so many different values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create/table.md/#specialized-codecs) that can make data even more compact.
**Data compression:** Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data a column often has the same, or not so many different, values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create/table.md/#specialized-codecs) that can make data even more compact.

**Vectorized query execution:** ClickHouse not only stores data in columns but also processes data in columns. It leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage.
**Vectorized query execution:** ClickHouse not only stores data in columns but also processes data in columns. This leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage.

**Scalability:** ClickHouse can leverage all available CPU cores and disks to execute even a single query. Not only on a single server but all CPU cores and disks of a cluster as well.

Expand Down

0 comments on commit d8bd043

Please sign in to comment.