Skip to main content

· 2 min read

The following useful query shows which of your executed queries used the most memory. A couple of comments about this query:

  • the results are computed from the past day (now() - toIntervalDay(1))) but you can easily modify the time interval
  • it assumes you have a cluster named default, which is the name of your cluster in ClickHouse Cloud. Change default to the name of your cluster
  • if you do not have a cluster, see the query listed at the end of this article
SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
clusterAllReplicas(default, system.query_log)
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

The response looks like:

┌─nb_query─┬─user────┬─query─────────────────────────────────────────────────────────┬───memory─┬─normalized_query_hash─┐
│ 11 │ default │ select version() │ 46178924 │ 7202516440347714159 │
│ 2 │ default │ SELECT * FROM "system"."table_functions" LIMIT 31 OFFSET 0 │ 8391544 │ 12830067173062987695 │
└──────────┴─────────┴───────────────────────────────────────────────────────────────┴──────────┴───────────────────────┘
note

If you do not have a system.query_log table, then you likely do not have query logging enabled. View the details of the query_log setting for details on how to enable it.

If you do not have a cluster, use can just query your one system.query_log table directly:

SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
system.query_log
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

· One min read

We often get asked about a good schema migration tool for ClickHouse and what is the best practice to manage database schemas in ClickHouse that might change over time? There is no standard schema migration tool for ClickHouse, but we have compiled the following list (in no particular order) of automatic schema migration tools with support for ClickHouse that we know:

· One min read

Question

How do I view the number of active or queued mutations?

Answer

Monitoring the number of active or queued mutations is important if you are performing a lot of ALTER or UPDATE statements on your tables. These queries rewrite data parts and are not atomic - they are ordered by their creation part and applied to each part in that order. You can find more details on mutations in the docs.

Each mutation generates an entry in the system.mutations table. When performing a large number of mutations, you can monitor the count running and queued mutations with this:

SELECT
hostname() AS host,
count()
FROM clusterAllReplicas('default', 'system.mutations')
GROUP BY host;
note

This query assumes you are running a cluster named default, which is the name of your cluster in ClickHouse Cloud. Replace default with the name of your cluster.

If you do not have a cluster, use this command:

SELECT
hostname() AS host,
count()
FROM system.mutations
GROUP BY host;

We also recommend reading this recent blog on updates and deletes.

· 2 min read

ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with ZooKeeper, but it might not be obvious why you should use ClickHouse Keeper instead of ZooKeeper. This article discusses some of the benefits of Keeper.

Answer

ClickHouse Cloud uses clickhouse-keeper at large scale for thousands of services in a multi-tenant environment. We designed and built Keeper so that we could remove our dependency on the Java-based ZooKeeper implementation. ClickHouse Keeper solves many well-known drawbacks of ZooKeeper and makes additional improvements, including:

  • Snapshots and logs consume much less disk space due to better compression
  • No limit on the default packet and node data size (it is 1 MB in ZooKeeper)
  • No zxid overflow issue (it forces a restart for every 2B transactions in ZooKeeper)
  • Faster recovery after network partitions due to the use of a better distributed consensus protocol
  • It uses less memory for the same volume of data
  • It is easier to setup, and it does not require specifying the JVM heap size or a custom garbage collection implementation
  • A few custom commands in the protocol enable faster operations in ReplicatedMergeTree tables
  • A larger coverage by Jepsen tests

In addition, ClickHouse Support has observed a massive decrease in cluster problems in cases with sites who use clickhouse-keeper rather than ZooKeeper.

Check out the Keeper docs page for more details on how to configure and run ClickHouse Keeper.

· 2 min read

If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization setting to optimize serialization.

This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:

SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM table_name
FORMAT Vertical

For example, we ran this query above on the environmental sensors dataset table named sensors which has over 20B rows and 19 columns:

SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM sensors
FORMAT Vertical

Here is response:


Row 1:
──────
round(avg(equals(sensor_id, defaultValueOfArgumentType(sensor_id))), 3): 0
round(avg(equals(sensor_type, defaultValueOfArgumentType(sensor_type))), 3): 0.159
round(avg(equals(location, defaultValueOfArgumentType(location))), 3): 0
round(avg(equals(lat, defaultValueOfArgumentType(lat))), 3): 0.001
round(avg(equals(lon, defaultValueOfArgumentType(lon))), 3): 0.001
round(avg(equals(timestamp, defaultValueOfArgumentType(timestamp))), 3): 0
round(avg(equals(P1, defaultValueOfArgumentType(P1))), 3): 0.474
round(avg(equals(P2, defaultValueOfArgumentType(P2))), 3): 0.475
round(avg(equals(P0, defaultValueOfArgumentType(P0))), 3): 0.995
round(avg(equals(durP1, defaultValueOfArgumentType(durP1))), 3): 0.999
round(avg(equals(ratioP1, defaultValueOfArgumentType(ratioP1))), 3): 0.999
round(avg(equals(durP2, defaultValueOfArgumentType(durP2))), 3): 1
round(avg(equals(ratioP2, defaultValueOfArgumentType(ratioP2))), 3): 1
round(avg(equals(pressure, defaultValueOfArgumentType(pressure))), 3): 0.83
round(avg(equals(altitude, defaultValueOfArgumentType(altitude))), 3): 1
round(avg(equals(pressure_sealevel, defaultValueOfArgumentType(pressure_sealevel))), 3): 1
round(avg(equals(temperature, defaultValueOfArgumentType(temperature))), 3): 0.532
round(avg(equals(humidity, defaultValueOfArgumentType(humidity))), 3): 0.544

1 row in set. Elapsed: 992.041 sec. Processed 20.69 billion rows, 1.39 TB (20.86 million rows/s., 1.40 GB/s.)

From the results above:

  • the sensor_id columns is not sparse at all. In fact, every row has a non-zero value
  • the sensor_type is only sparse about 15.9% of the time
  • the P0 column is very sparse: 99.9% of the values are zero
  • the pressure column is quite sparse at 83%
  • and temperature column has 53.2% of its values missing or zero

Like we said, it's a handy query for computing how sparse your columns are in a ClickHouse table!

· 2 min read

Question

How can I validate that two queries return the same resultsets?

Answer

You can use the below approach:

WITH
(
SELECT sum(cityHash64(*))
FROM
(
-- your query 1 here
-- SELECT ...
)
) AS q1_resultset_hash,
(
SELECT sum(cityHash64(*))
FROM
(
-- your query 2 here
-- SELECT ...
)
) AS q2_resultset_hash
SELECT equals(q1_resultset_hash,q2_resultset_hash) as Q1_equals_Q2

The example uses a CTE to calculate sums of the cityHash value of each row in these two queries and will return 1 if the two resultsets are identical.

Using some integers sequence data and some pretty formatting:

WITH
(
SELECT sum(cityHash64(*))
FROM
(
SELECT *
FROM numbers(10)
ORDER BY number DESC
)
) AS q1_resultset_hash,
(
SELECT sum(cityHash64(*))
FROM
(
SELECT *
FROM numbers(10)
ORDER BY number ASC
)
) AS q2_resultset_hash
SELECT q1_resultset_hash = q2_resultset_hash AS Q1_equals_Q2
FORMAT Pretty

will return:

┏━━━━━━━━━━━━━━┓
┃ Q1_equals_Q2 ┃
┡━━━━━━━━━━━━━━┩
│ 1 │
└──────────────┘

While this can be handy in many scenarios, it can't be considered as a silver bullet to validate equality of resultsets for all types and there are caveats to using it, for example if any row contains NULL values the above approach will fail.

· One min read

Cause of the Error

This error occurs while trying to connect to a ClickHouse server using clickhouse-client. The cause of the error is either:

  • the client configuration file config.xml is missing the root certificate in the machine CA default store, or
  • there is a self-signed or internal CA certificate that is not configured

Solution

If using an internal or self-signed CA, configure the CA root certificate in config.xml in the client directory (e.g. /etc/clickhouse-client) and disable the loading of the default root CA certificates from the default location.

Here is an example configuration:

<openSSL>
<client>
<loadDefaultCAFile>false</loadDefaultCAFile>
<caConfig>/etc/clickhouse-server/certs/marsnet_ca.crt</caConfig>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>

Additional resources

View https://clickhouse.com/docs/en/interfaces/cli/#configuration_files

· 2 min read

When this error occurs, a table shows as readonly and the error states intersecting parts. You can see the error in the logs or by

SELECT *
FROM system.replicas
WHERE is_readonly = 1

The error message looks like:

Code: 49. DB::Exception: Part XXXXX intersects previous part YYYYY. It is a bug or a result of manual intervention in the ZooKeeper data. (LOGICAL_ERROR) (version 21.12.4.1 (official build))

Cause of the Error

This error can be caused by a race condition between mergeSelectingTask and queue reinitialization.

Solution

Execute the following queries on all replicas:

DETACH TABLE table_name;  -- Required for DROP REPLICA

SYSTEM DROP REPLICA 'replica_name' FROM ZK PATH '/table_path_in_zk/'; -- It will remove everything from /table_path_in_zk

ATTACH TABLE table_name; -- Table will be in readonly mode, because there is no metadata in ZK

Then execute the following on all replicas:

SYSTEM RESTORE REPLICA table_name;  -- It will detach all partitions, re-create metadata in ZK (like it's new empty table), and then attach all partitions back

SYSTEM SYNC REPLICA table_name; -- Wait for replicas to synchronize parts. Also it's recommended to check `system.detached_parts` on all replicas after recovery is finished.
tip

You should upgrade to the latest version of ClickHouse

Additional resources

Related PRs and GitHub issues:

Versions affected:

ClickHouse v 22.12 and prior

· 3 min read

TTL is going to be eventually applied. What does that mean? The MergeTree table setting merge_with_ttl_timeout sets the minimum delay in seconds before repeating a merge with delete TTL. The default value is 14400 seconds (4 hours). But that is just the minimum delay, it can take longer until a merge for delete TTL is triggered.

You can view all of your current TTL settings (like merge_with_ttl_timeout) with this query:

SELECT *
FROM system.merge_tree_settings
WHERE name like '%ttl%'

The response looks like:

┌─name───────────────────────────────────────────────────────────┬─value───┬─changed─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type───┐
│ max_replicated_merges_with_ttl_in_queue │ 1 │ 0 │ How many tasks of merging parts with TTL are allowed simultaneously in ReplicatedMergeTree queue. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │
│ max_number_of_merges_with_ttl_in_pool │ 2 │ 0 │ When there is more than specified number of merges with TTL entries in pool, do not assign new merge with TTL. This is to leave free threads for regular merges and avoid "Too many parts" │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │
│ merge_tree_clear_old_broken_detached_parts_ttl_timeout_seconds │ 2592000 │ 1 │ Remove old broken detached parts in the background if they remained intouched for a specified by this setting period of time. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │
│ merge_with_ttl_timeout │ 14400 │ 0 │ Minimal time in seconds, when merge with delete TTL can be repeated. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Int64 │
│ merge_with_recompression_ttl_timeout │ 14400 │ 0 │ Minimal time in seconds, when merge with recompression TTL can be repeated. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Int64 │
│ ttl_only_drop_parts │ 0 │ 0 │ Only drop altogether the expired parts and not partially prune them. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │
│ materialize_ttl_recalculate_only │ 0 │ 0 │ Only recalculate ttl info when MATERIALIZE TTL │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │
└────────────────────────────────────────────────────────────────┴─────────┴─────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴────────┘

You can use SHOW CREATE TABLE to check if your table contains TTL rules, as well as if any of the table SETTINGS modified the values of the settings above:

SHOW CREATE TABLE <TableName>

Force a TTL rule to be applied

This is not the most elegant solution, but you can explicitly call MATERIALIZE TTL, which forces all the TTL rules of a table to be materialized:

ALTER TABLE my_table
MATERIALIZE TTL

Background threads affecting TTL

It is possible that your TTL rules are not being applied because there are not enough working threads in the background pool. For example, if you insert data intensively, then the whole background pool might be utilized for normal merges. However, you can increase the background pool size.

You can check your current background pool size with this query:

SELECT *
FROM system.settings
WHERE name = 'background_pool_size';

The response looks like:

┌─name─────────────────┬─value─┬─changed─┬─description─────────────────────┬─min──┬─max──┬─readonly─┬─type───┬─default─┬─alias_for─┐
│ background_pool_size │ 16 │ 0 │ Obsolete setting, does nothing. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │ 16 │ │
└──────────────────────┴───────┴─────────┴─────────────────────────────────┴──────┴──────┴──────────┴────────┴─────────┴───────────┘

Check the docs for how to modify the background_pool_size setting, which is configured as:

<background_pool_size>16</background_pool_size>

You can check the current background pool activity with this query:

SELECT *
FROM system.metrics
WHERE metric like 'Background%'