ClickHouse数据表迁移实战之-remote方式

clickhouse,数据表,迁移,实战,remote,方式 · 浏览次数 : 64

小编点评

## Summary of ClickHouse Data Table Migration with Remote This document provides a comprehensive overview of the remote data table migration process for ClickHouse, focusing on the **remote** approach. **Background:** * ClickHouse is a columnar database management system (DBMS) used for large-scale data analysis. * We have ClickHouse clusters with JCHDB distributed database for our application. * We need to migrate data from the original ClickHouse cluster to another one. **Migration Methods:** * **remote:** This method uses remote functions for data transfer, which is suitable for smaller data sets. * **file export/import:** This method involves exporting data to files and then importing them into the new cluster. * **CSV:** This method exports data to a CSV file and imports it into the new cluster. * **Linux pipe:** This method uses a pipe to directly transfer data between the two clusters. **Steps:** 1. Create a new ClickHouse cluster. 2. Create a database. 3. Create a table with the same schema as the original table. 4. Write data to the new cluster's table. 5. Start the data migration process: * Change the remote cluster address in the code. * Stop the existing Flink writing tasks. * Transfer data from the old to the new cluster. * Restart the Flink writing tasks. 6. Verify that the data has been successfully migrated. **Additional Notes:** * Ensure that the number of partitions does not exceed the `max_partitions_per_insert_block` setting. * Use the `final` table to combine any duplicate data entries before merging. **References:** * Official ClickHouse documentation: * SQL Reference Statements: `CREATE` table * Remote Use with JDCloud ClickHouse: `docs.jdcloud.com/cn/jchdb/product-overviewremote使用` * Blog post on ClickHouse data transfer: `blog.csdn.net/u010180815/article/details/1150702356`

正文

1 引言

ClickHouse是一个用于联机分析(OLAP)的列式数据库管理系统(DBMS)。我们内部很多的报表、数据看板都基于它进行开发。今天为大家带来remote方式的ClickHouse数据表迁移的完整过程介绍，如有错误，还请各位大佬指正。

以下sql语句为测试使用，如需使用请根据实际情况修改。

2 背景

我们使用的是京东云提供的分布式数据库 JCHDB，原ClickHouse是两个部门共用的，因涉及相关业务、管理及费用划分等问题，需进行ClickHouse集群的分离。原ClickHouse面包含表有：业务A订单表与业务B大屏数据表；拆分后需要将业务B的大屏数据表迁移到新ClickHouse集群中去。

3 迁移方式

经查阅，迁移方式有如下几种：

1.通过remote函数进行数据迁移

2.通过文件导出导入方式进行数据迁移

3.通过CSV文件导出导入

4.通过Linux pipe管道进行流式导出导入

经过与云JCHDB负责运维同事沟通及调研，因数据量目前不大，比较适合采用remote方式进行迁移，注意remote使用的前提要求即可。如果数量过大请参考其他迁移方式。

remote方式使用前，请增加max_partitions_per_insert_block参数值，避免语句执行失败，示例报如下错误：

报错：
Too many partitions for single INSERT block (more than 100). The limit is controlled by 'max_partitions_per_insert_block' setting

原因：
clickhouse 要求每次写入的数据不能跨越特别多的 partitions，具体受参数 max_partitions_per_insert_block 控制，调整该参数即可。
复制

4 步骤

创建新clickhouse集群：请到云管平台申请，请先预估好业务未来数据量，再去填写申请的容量配置；
创建数据库：

CREATE DATABASE IF NOT EXISTS new_database on cluster default;
复制

注意后面的on cluster default;必须带上。

创建表：

根据实际表字段、表引擎编写sql。参考：https://clickhouse.com/docs/zh/sql-reference/statements/create/table

建立测试表

CREATE TABLE 
IF NOT EXISTS new_database.test_ck_01_local ON CLUSTER default
( 
    id String COMMENT '随机主键',
    dt Date COMMENT '分区字段'
) 
ENGINE = ReplicatedReplacingMergeTree
('/clickhouse/new_database/tables/{shard}/test_ck_01_local', '{replica}')
PARTITION BY toYYYYMMDD
(dt) 
ORDER BY id;

CREATE TABLE
IF NOT EXISTS new_database.test_ck_01 ON CLUSTER default AS new_database.test_ck_01_local
ENGINE=Distributed(default, new_database, test_ck_01_local, rand());
复制

写入测试数据：

在原clickhouse里执行写入数据语句：
INSERT INTO old_database.test_ck_01 values('1',NOW());
INSERT INTO old_database.test_ck_01 values('2',NOW());
根据实际情况多些一些数据即可。

从新ClickHouse集群客户端里执行查询语句：（如不成功说明网络不通）
SELECT * from 
remote('老集群地址',old_database.test_ck_01,'user','password')
复制

测试迁移命令：

INSERT INTO new_database.test_ck_01
SELECT * from 
remote('老集群地址',old_database.test_ck_01,'user','password')
复制

正式迁移步骤如下：

•提前修改代码里的clickhouse地址，替换新地址；

•通知大数据实时负责人停止flink等写入任务；

•进行数据迁移到新ClickHouse集群（参考以上迁移语句）；

•通知大数据实时负责人开启flink等写入任务；

•验证数据是否同步到新ClickHouse集群；

•在灰度或预发环境部署或重启，通过代码调用查询新ClickHouse集群看是否正常。

迁移语句：（在目标clickhouse集群客户端内执行）

INSERT INTO new_database.待迁移的数据表
SELECT * from 
remote('老集群地址',old_database.老数据表,'user','password')
复制

验证表数据量是否一致：

SELECT COUNT(1) from 待迁移的数据表 final
复制

注意： 迁移完成后数据量可能不一致，请使用 final合并查询，会把重复的数据条目进行合并。

5 参考

官方文档：https://clickhouse.com/docs/zh

京东云clickhouse学习：https://docs.jdcloud.com/cn/jchdb/product-overview

remote使用：https://blog.csdn.net/u010180815/article/details/115070235

6 总结

以上就是使用remote方式进行ClickHouse数据表迁移的实战操作。通过这种方式，我们可以将数据表从一个ClickHouse集群迁移到另一个ClickHouse集群，从而实现数据的无缝迁移。

作者：京东物流刘邓忠

内容来源：京东云开发者社区

ClickHouse数据表迁移实战之-remote方式

小编点评

正文

1 引言

2 背景

3 迁移方式

4 步骤

5 参考

6 总结

与ClickHouse数据表迁移实战之-remote方式相似的内容：

ClickHouse数据表迁移实战之-remote方式

Elasticsearch与Clickhouse数据存储对比

大数据 - ClickHouse

万字长文详述ClickHouse在京喜达实时数据的探索与实践

[转帖]Redis 运维实战第01期：Redis 复制

[转帖]9.2 TiFlash 架构与原理

[转帖]clickhouse存储机制以及底层数据目录分布

基于ClickHouse解决活动海量数据问题

Docker Compose V2 安装 ClickHouse v20.6.8.5 经验分享

大数据 - ADS 数据可视化实现

# 热门排行