【最佳实践】MongoDB导出导入数据

最佳,实践,mongodb,导出,导入,数据 · 浏览次数 : 38

小编点评

**mongorestore 导入数据库数据效率分析** **1.索引创建效率** - 最终索引创建失败,可能 due to以下原因: - 多并发导入导致索引生效缓慢。 -索引数据量过大,需要大量内存创建索引。 -索引格式不正确,导致创建失败。 **2.索引创建排版** - 建议采用以下排版策略: - 使用多线程或并发导入技术。 - 创建索引时使用合适的格式。 -优化索引数据量。 -考虑索引格式选择。 **3.数据恢复效率** - 建议设置以下参数: - --numInsertionWorkersPerCollection=82。 - --noIndexRestore。 - --bypassDocumentValidation。 **4.数据恢复后索引创建** - 建议在数据恢复后,使用以下命令创建索引: - mongorestore --port=20000 -uadmin -p'passwd' --authenticationDatabase=admin --numInsertionWorkersPerCollection=10 --bypassDocumentValidation --nsInclude=\"likingtest.*\" --nsFrom=\"likingtest.*\" --nsTo=\"likingtest.*\" --noIndexRestore /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914.10.2.2.2-4.log2023-09-15T19:02:59.747+0800

正文

首先说一下这个3节点MongoDB集群各个维度的数据规模:
1、dataSize: 1.9T
2、storageSize: 600G
3、全量备份-加压缩开关:186G,耗时 8h
4、全量备份-不加压缩开关:1.8T,耗时 4h27m
具体导出的语法比较简单,此处不再赘述,本文重点描述导入的优化过程,最后给出导入的最佳实践。

■ 2023-09-13T20:00 第1次4并发导入测试

mongorestore --port=20000 -uadmin -p'passwd' --authenticationDatabase=admin --numInsertionWorkersPerCollection=4 --bypassDocumentValidation -d likingtest /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest >> 10.2.2.2.log 2>&1 &
tail -100f /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/10.2.2.2.log
以上导入:
2023-09-13T21:59:55.452+0800    The --db and --collection flags are deprecated for this use-case; please use --nsInclude instead, i.e. with --nsInclude=${DATABASE}.${COLLECTION}
2023-09-13T21:59:55.452+0800    building a list of collections to restore from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest dir
2023-09-13T21:59:55.466+0800    reading metadata for likingtest.oprceConfiguration from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/oprceConfiguration.metadata.json
2023-09-13T21:59:55.478+0800    reading metadata for likingtest.oprceDataObj from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/oprceDataObj.metadata.json
2023-09-13T21:59:55.491+0800    reading metadata for likingtest.oprcesDataObjInit from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/oprcesDataObjInit.metadata.json
2023-09-13T21:59:55.503+0800    reading metadata for likingtest.role from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/role.metadata.json
2023-09-13T21:59:55.508+0800    reading metadata for likingtest.activityConfiguration from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/activityConfiguration.metadata.json
2023-09-13T21:59:55.511+0800    reading metadata for likingtest.history_task from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/history_task.metadata.json
2023-09-13T21:59:55.512+0800    reading metadata for likingtest.resOutRelDataSnapshot from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/resOutRelDataSnapshot.metadata.json
2023-09-13T21:59:55.520+0800    reading metadata for likingtest.snapshotResource from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/snapshotResource.metadata.json
2023-09-13T21:59:55.524+0800    reading metadata for likingtest.oprceDataObjDraft from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/oprceDataObjDraft.metadata.json
2023-09-13T21:59:55.526+0800    reading metadata for likingtest.oprceDataObjInit from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/oprceDataObjInit.metadata.json
2023-09-13T21:59:55.761+0800    restoring likingtest.snapshotResource from /u01/nfs/xxxxx_mongodb/10.1.1.1/20230913/likingtest/snapshotResource.bson
...
2023-09-13T22:00:01.451+0800    [........................]      likingtest.oprceDataObj   408MB/1205GB    (0.0%)
...
2023-09-13T21:59:58.323+0800    finished restoring likingtest.oprceDataObjDraft (1559 documents, 0 failures)
2023-09-13T22:00:01.034+0800    finished restoring likingtest.resOutRelDataSnapshot (34426 documents, 0 failures)
2023-09-13T22:00:01.559+0800    finished restoring likingtest.history_task (3629 documents, 0 failures)
2023-09-13T22:00:02.086+0800    finished restoring likingtest.activityConfiguration (974 documents, 0 failures)
2023-09-13T22:00:02.293+0800    finished restoring likingtest.oprceConfiguration (162 documents, 0 failures)
2023-09-13T22:00:02.529+0800    finished restoring likingtest.oprcesDataObjInit (4 documents, 0 failures)
2023-09-13T22:00:02.857+0800    finished restoring likingtest.role (10 documents, 0 failures)
2023-09-13T22:00:29.153+0800    [########################]  likingtest.snapshotResource  2.04GB/2.04GB  (100.0%)
2023-09-13T22:00:29.155+0800    finished restoring likingtest.snapshotResource (50320 documents, 0 failures)
...
2023-09-14T00:18:58.451+0800    [############............]      likingtest.oprceDataObj  651GB/1205GB   (54.0%)
2023-09-14T00:18:59.857+0800    [########################]  likingtest.oprceDataObjInit  635GB/635GB  (100.0%)
2023-09-14T00:18:59.888+0800    finished restoring likingtest.oprceDataObjInit (43776648 documents, 0 failures)
...
2023-09-14T02:05:58.904+0800    [########################]      likingtest.oprceDataObj  1205GB/1205GB  (100.0%)
2023-09-14T02:05:58.937+0800    finished restoring likingtest.oprceDataObj (53311330 documents, 0 failures)
2023-09-14T02:05:58.945+0800    no indexes to restore for collection likingtest.activityConfiguration
2023-09-14T02:05:58.945+0800    no indexes to restore for collection likingtest.history_task
2023-09-14T02:05:58.945+0800    restoring indexes for collection likingtest.oprcesDataObjInit from metadata
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"flowId_1_activityConfiguration.activityNameEn_1", "ns":"likingtest.oprcesDataObjInit", "v":2}, Key:primitive.D{primitive.E{Key:"flowId", Value:1}, primitive.E{Key:"activityConfiguration.activityNameEn", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"oprceInfo.oprceInstID_1_activityInfo.activityInstID_1_workitemInfo.workItemID_1", "ns":"likingtest.oprcesDataObjInit", "v":2}, Key:primitive.D{primitive.E{Key:"oprceInfo.oprceInstID", Value:1}, primitive.E{Key:"activityInfo.activityInstID", Value:1}, primitive.E{Key:"workitemInfo.workItemID", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    no indexes to restore for collection likingtest.role
2023-09-14T02:05:58.976+0800    no indexes to restore for collection likingtest.snapshotResource
2023-09-14T02:05:58.976+0800    no indexes to restore for collection likingtest.oprceDataObjDraft
2023-09-14T02:05:58.976+0800    restoring indexes for collection likingtest.oprceDataObjInit from metadata
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"oprceInfo.oprceInstID_1_activityInfo.activityInstID_1_workitemInfo.workItemID_1", "ns":"likingtest.oprceDataObjInit", "v":2}, Key:primitive.D{primitive.E{Key:"oprceInfo.oprceInstID", Value:1}, primitive.E{Key:"activityInfo.activityInstID", Value:1}, primitive.E{Key:"workitemInfo.workItemID", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"flowNo_1", "ns":"likingtest.oprceDataObjInit", "v":2}, Key:primitive.D{primitive.E{Key:"flowNo", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    no indexes to restore for collection likingtest.oprceConfiguration
2023-09-14T02:05:58.976+0800    no indexes to restore for collection likingtest.resOutRelDataSnapshot
2023-09-14T02:05:58.976+0800    restoring indexes for collection likingtest.oprceDataObj from metadata
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"flowId_1_activityConfiguration.activityNameEn_1", "ns":"likingtest.oprceDataObj", "v":2}, Key:primitive.D{primitive.E{Key:"flowId", Value:1}, primitive.E{Key:"activityConfiguration.activityNameEn",Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"flowNo_1", "ns":"likingtest.oprceDataObj", "v":2}, Key:primitive.D{primitive.E{Key:"flowNo", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"oprceInfo.oprceInstID_1_activityInfo.activityInstID_1_workitemInfo.workItemID_1", "ns":"likingtest.oprceDataObj", "v":2}, Key:primitive.D{primitive.E{Key:"oprceInfo.oprceInstID", Value:1}, primitive.E{Key:"activityInfo.activityInstID", Value:1}, primitive.E{Key:"workitemInfo.workItemID", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T02:05:58.976+0800    index: &idx.IndexDocument{Options:primitive.M{"name":"flowId_1_activityConfiguration.activityNameEn_1", "ns":"likingtest.oprceDataObjInit", "v":2}, Key:primitive.D{primitive.E{Key:"flowId", Value:1}, primitive.E{Key:"activityConfiguration.activityNameEn", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2023-09-14T03:45:47.152+0800    97179062 document(s) restored successfully. 0 document(s) failed to restore.

可见:
1、配置并发参数 --numInsertionWorkersPerCollection=4 和 检查参数 bypassDocumentValidation 后,restore速度大大提升,1.2T 的一个大集合 oprceDataObj,由原来默认restore方式约 12h,降为:4h
2、restore完所有数据以后,最后再restore索引,restore索引还是需要一定的时间,本次耗时:1h40m【注:实际没有成功,索引并未生效】
3、新版本的 -d -c 参数需统一修改为:--nsInclude --nsFrom= --nsTo=

■ 2023-09-14T10:40 第2次8并发导入测试

mongorestore --port=20000 -uadmin -p'passwd' --authenticationDatabase=admin --numInsertionWorkersPerCollection=8 --bypassDocumentValidation -d likingtest /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914/likingtest >> 10.2.2.2.log 2>&1 &
tail -100f /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914/10.2.2.2.log
---
2023-09-14T10:40:45.492+0800    The --db and --collection flags are deprecated for this use-case; please use --nsInclude instead, i.e. with --nsInclude=${DATABASE}.${COLLECTION}
...
2023-09-14T10:40:48.493+0800    [........................]       likingtest.oprceDataObj   112MB/1208GB    (0.0%)
...
2023-09-14T12:57:34.859+0800    [########################]       likingtest.oprceDataObj  1208GB/1208GB  (100.0%)
2023-09-14T12:57:34.867+0800    finished restoring likingtest.oprceDataObj (53413481 documents, 0 failures)

可见:
1、配置并发参数 --numInsertionWorkersPerCollection=8 和 检查参数 --bypassDocumentValidation 后,restore速度再次大大提升,1.2T的一个大集合 oprceDataObj,由原来默认restore方式约 12h,降为:2h17m
2、本次恢复采用nfs备份恢复,一台8C的虚机,8并发恢复时cpu占用约40%,网络接收速度300MB/s左右,本地磁盘写入速度在30-200MB/s左右,可见网络带段不是瓶颈。可以预见,如果采用更高的主机配置,尤其是IO更好的磁盘,resotore时间必将更少。

■ 2023-09-14T16:10 第3次12并发导入测试

【注意】由于新版本mongorestore摒弃了-d -c参数,虽然可用但使用不够灵活,因此需使用新参数--nsInclude,对于该参数的使用,摸索了多次才找到使用的限制条件,即 directory 必须为数据库备份的根目录/上一级目录,而不是 数据库目录!即类似 dumpdir/20230914,而不是 dumpdir/20230914/database!这是一个巨大的坑,切记!当然,这个目录下一定不能有其他不可识别的文件,否则也会报错。

mongorestore --port=20000 -uadmin -p'passwd' --authenticationDatabase=admin --numInsertionWorkersPerCollection=12 --bypassDocumentValidation --nsInclude="likingtest.*" /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914 > 20230914.10.2.2.2-3.log 2>&1 &
tail -100f /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914.10.2.2.2-3.log
---
2023-09-14T16:10:19.245+0800    preparing collections to restore from
...
2023-09-14T18:18:18.996+0800    [########################]  likingtest.oprceDataObj  1208GB/1208GB  (100.0%)
2023-09-14T18:18:19.014+0800    finished restoring likingtest.oprceDataObj (53413481 documents, 0 failures)

可见:
1、并发由 8 增至 12 并无效率提升,结论是 6-8 个并发就可以,这一点与oracle的并发导入设置为 6 基本是最佳实践类似。
2、本次恢复采用nfs备份恢复,一台8C的虚机,12并发恢复时cpu占用约60%,网络接收速度300MB/s左右,本地磁盘写入速度在30-500MB/s左右,可见网络带段不是瓶颈。可以预见,如果采用更高的主机配置,尤其是IO更好的磁盘,resotore时间必将更少。
3、关于索引的restore,restore时首先恢复数据,最后再创建索引,比较大的集合的索引创建还是需要较多的时间:

      currentOpTime: '2023-09-14T20:23:59.435+08:00',
...
      command: {
        createIndexes: 'oprceDataObj',
        indexes: [
          {
            key: { flowId: 1, 'activityConfiguration.activityNameEn': 1 },
            name: 'flowId_1_activityConfiguration.activityNameEn_1',
            ns: 'likingtest.oprceDataObj'
          },
          {
            key: { flowNo: 1 },
            name: 'flowNo_1',
            ns: 'likingtest.oprceDataObj'
          },
          {
            key: {
              'oprceInfo.oprceInstID': 1,
              'activityInfo.activityInstID': 1,
              'workitemInfo.workItemID': 1
            },
            name: 'oprceInfo.oprceInstID_1_activityInfo.activityInstID_1_workitemInfo.workItemID_1',
            ns: 'likingtest.oprceDataObj'
          }
        ],
.....
      currentOpTime: '2023-09-14T20:23:59.489+08:00',
...
      command: {
        createIndexes: 'oprcesDataObjInit',
        indexes: [
          {
            key: { flowId: 1, 'activityConfiguration.activityNameEn': 1 },
            name: 'flowId_1_activityConfiguration.activityNameEn_1',
            ns: 'likingtest.oprcesDataObjInit'
          },
          {
            key: {
              'oprceInfo.oprceInstID': 1,
              'activityInfo.activityInstID': 1,
              'workitemInfo.workItemID': 1
            },
            name: 'oprceInfo.oprceInstID_1_activityInfo.activityInstID_1_workitemInfo.workItemID_1',
            ns: 'likingtest.oprcesDataObjInit'
          }
        ],
......第二天再看,还没创建完索引:
      currentOpTime: '2023-09-15T09:16:16.460+08:00',
      effectiveUsers: [ { user: 'admin', db: 'admin' } ],
      runBy: [ { user: '__system', db: 'local' } ],
      threaded: true,
      opid: 'shard1:11312917',
      lsid: {
        id: new UUID("e78379ff-9664-46b1-9e87-2bdd4abc5c5f"),
        uid: Binary.createFromBase64("O0CMtIVItQN4IsEOsJdrPL8s7jv5xwh5a/A5Qfvs2A8=", 0)
      },
      secs_running: Long("53877"),
      microsecs_running: Long("53877330742"),
      op: 'command',
      ns: 'likingtest.oprcesDataObjInit',
      redacted: false,
      command: {
        createIndexes: 'oprcesDataObjInit',
......第二天满24h,还没创建完索引:
      currentOpTime: '2023-09-15T18:55:16.877+08:00',
      effectiveUsers: [ { user: 'admin', db: 'admin' } ],
      runBy: [ { user: '__system', db: 'local' } ],
      threaded: true,
      opid: 'shard1:11312917',
      lsid: {
        id: new UUID("e78379ff-9664-46b1-9e87-2bdd4abc5c5f"),
        uid: Binary.createFromBase64("O0CMtIVItQN4IsEOsJdrPL8s7jv5xwh5a/A5Qfvs2A8=", 0)
      },
      secs_running: Long("88617"),
      microsecs_running: Long("88617747875"),
      op: 'command',
      ns: 'likingtest.oprcesDataObjInit',
      redacted: false,
      command: {
        createIndexes: 'oprcesDataObjInit',
        indexes: [
          {
            key: { flowId: 1, 'activityConfiguration.activityNameEn': 1 },
            name: 'flowId_1_activityConfiguration.activityNameEn_1',
            ns: 'likingtest.oprcesDataObjInit'
          },

以上可见,mongorestore 导入数据库的数据效率目前是基本可控、可接受的,至少对于1.2T的大集合是可以接受的,但是最后的索引创建实在过于缓慢,且没有找到合适的解决办法:索引需多并发执行创建,且确保索引生效,本次索引创建最后并未生效

■ 2023-09-15T19:02 第4次10并发导入测试,不恢复索引

mongorestore --port=20000 -uadmin -p'passwd' --authenticationDatabase=admin --numInsertionWorkersPerCollection=10 --bypassDocumentValidation --nsInclude="likingtest.*" --nsFrom="likingtest.*" --nsTo="likingtest.*" --noIndexRestore /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914 > 20230914.10.2.2.2-4.log 2>&1 &
tail -100f /u01/nfs/xxxxx_mongodb/10.1.1.1/20230914.10.2.2.2-4.log
2023-09-15T19:02:59.747+0800    preparing collections to restore from
...
2023-09-15T21:24:36.145+0800    [########################]  likingtest.oprceDataObj  1208GB/1208GB  (100.0%)
2023-09-15T21:24:36.161+0800    finished restoring likingtest.oprceDataObj (53413481 documents, 0 failures)
2023-09-15T21:24:36.165+0800    97367732 document(s) restored successfully. 0 document(s) failed to restore.

以上可见,耗时:2h22m

结论

1、restore 时需设置大数据量 collection 多并发导入:--numInsertionWorkersPerCollection=8
2、不恢复索引:--noIndexRestore
3、数据恢复后,后台创建索引:本站搜索"MongoDB 重建索引"

与【最佳实践】MongoDB导出导入数据相似的内容:

【最佳实践】MongoDB导出导入数据

首先说一下这个3节点MongoDB集群各个维度的数据规模: 1、dataSize: 1.9T 2、storageSize: 600G 3、全量备份-加压缩开关:186G,耗时 8h 4、全量备份-不加压缩开关:1.8T,耗时 4h27m 具体导出的语法比较简单,此处不再赘述,本文重点描述导入的优化过

【最佳实践】高可用mongodb集群(1分片+3副本):规划及部署

结合我们的生产需求,本次详细整理了最新版本 MonogoDB 7.0 集群的规划及部署过程,具有较大的参考价值,基本可照搬使用。 适应数据规模为T级的场景,由于设计了分片支撑,后续如有大数据量需求,可分片横向扩展。 ■■■ 分片集群规划 ■ Configure hostname、hosts file

【技术积累】《MongoDB实战》笔记(1)

《MongoDB实战》笔记 第一章 为现代Web而生的数据库 特性 mongodb适合做水平扩展的数据库。 mongodb把文档组织成集合,无schema。 索引 mongodb的二级索引是B树实现。 每个集合最多可以创建64个索引, 副本集 mongodb通过副本集(replication set

MongoDB从入门到实战之.NET Core使用MongoDB开发ToDoList系统(8)-Ant Design Blazor前端框架搭建

前言 前面的章节我们介绍了一些值得推荐的Blazor UI组件库,通过该篇文章的组件库介绍最终我选用Ant Design Blazor这个UI框架作为ToDoList系统的前端框架。因为在之前的工作中有使用过Ant Design Vue、Ant Design Angular习惯并且喜欢Ant Des

Python史上最全种类数据库操作方法,你能想到的数据库类型都在里面!甚至还有云数据库!

本文将详细探讨如何在Python中连接全种类数据库以及实现相应的CRUD(创建,读取,更新,删除)操作。我们将逐一解析连接MySQL,SQL Server,Oracle,PostgreSQL,MongoDB,SQLite,DB2,Redis,Cassandra,Microsoft Access,El

[转帖]Linux遇到一个内存过高的报警——释放buff/cache

前些天一直受到内存报警,过一段时间就会恢复。由于开发工作有些多,就一直没理它,但是最近几天开始有些频繁了。虽然不影响业务,但是天天报警,还是让人提心吊胆的。因此就抽了一个上午的时间去解决一下这个问题。 排查问题 这台机器安装的是mongodb,因为最近业务增加,内容使用增加是正常的,但是实际的占用内

[转帖]【最佳实践】瀚高数据库安全版v4.5.8安装

瀚高数据库安全版v4.5.8已发布,功能和安装方式都有所不同。下面先跟我一起安装上吧。 操作系统环境:Centos7.9,处理器:x86_64 (说明:以下每一节的命令操作,均可以复制粘贴直接执行) 1. 安装软件 将软件包上传到root用户的目录下,执行以下命令校验安装包完整性,然后执行安装: m

[转帖]【最佳实践】prometheus 监控 sql server (使用sql_exporter)

https://www.cnblogs.com/gered/p/13535212.html 目录 【0】核心参考 【简述】 【1】安装配置 sql_exporter 【1.1】下载解压 sql_exporter 【1.2】修改配置文件 【1.3】自带的sql server监控采集器 【2】整合 pr

20个最佳实践提升Terraform工作流程|Part 1

将 Terraform 管理laC的技能提升到一个新的水平。

20个最佳实践提升Terraform工作流程|Part 2

将 Terraform 管理 IaC 的技能提升到一个新的水平。