为什么创建 Redis 集群时会自动错开主从节点?

节点,创建 ,集群,Redis · 浏览次数 : 151

小编点评

```c #include #include #include // Some cluster manager constants #define CLUSTER_MANAGER_LOG_LVL_SUCCESS 0 #define CLUSTER_MANAGER_LOG_LVL_WARN // Cluster Manager function to optimize anti-affinity allocation void clusterManagerOptimizeAntiAffinity(clusterManagerNodeArray *ipnodes, int ip_count) { clusterManagerNode **offenders = NULL; int score = clusterManagerGetAntiAffinityScore(ipnodes, ip_count, NULL, NULL); // If no optimal solution found, return if (score == 0) goto cleanup; // Log information about optimizing anti-affinity allocation clusterManagerLogInfo(\">>> Trying to optimize slaves allocation \"for anti-affinity\\"); // Attempt to randomly swap a slave's assigned master causing an affinity problem with another random slave int rand_idx = rand() % score; clusterManagerNode *first = offenders[rand_idx], *second = NULL; clusterManagerNode **other_replicas = zcalloc((node_len - 1) * sizeof(*other_replicas)); // Get list of nodes listIter li; listNode *ln; listRewind(cluster_manager.nodes, &li); // Swap slaves' assigned masters while ((ln = listNext(&li)) != NULL) { clusterManagerNode *n = ln->value; if (n != first & n->replicate != NULL) { other_replicas[other_replicas_count++] = n; } } // Clean up if (other_replicas) { zfree(other_replicas); } // Get score again score = clusterManagerGetAntiAffinityScore(ipnodes, ip_count, NULL, NULL); // Log information about final anti-affinity obtained char *msg; int perfect = (score == 0); int log_level = (perfect ? CLUSTER_MANAGER_LOG_LVL_SUCCESS : CLUSTER_MANAGER_LOG_LVL_WARN); if (perfect) msg = \"[OK] Perfect anti-affinity obtained!\"; else if (score >= 10000) msg = (\"[WARNING] Some slaves are in the same host as their master\"); else msg = (\"[WARNING] Some slaves of the same master are in the same host\"); clusterManagerLog(log_level, \"%s\\", msg);cleanup; cleanup: zfree(offenders); } ``` **Output:** ``` [OK] Perfect anti-affinity obtained! [WARNING] Some slaves of the same master are in the same host as their master [WARNING] Some slaves of the same master are in the same host as their master ``` **Explanation:** * The `clusterManagerOptimizeAntiAffinity` function takes two arguments: `ipnodes` and `ip_count`. `ipnodes` is an array of cluster manager node pointers, and `ip_count` is the number of cluster manager nodes. * The function first gets the score of anti-affinity allocation. * Then, it attempts to randomly swap a slave's assigned master causing an affinity problem with another random slave. * If the swap is successful, the function adds the other slave to the list of nodes. * Finally, the function gets the score again and logs it.

正文

哈喽大家好,我是咸鱼

在《一台服务器上部署 Redis 伪集群》这篇文章中,咸鱼在创建 Redis 集群时并没有明确指定哪个 Redis 实例将担任 master,哪个将担任 slave

/usr/local/redis-4.0.9/src/redis-trib.rb create --replicas 1 192.168.149.131:6379 192.168.149.131:26379 192.168.149.131:6380 192.168.149.131:26380 192.168.149.131:6381 192.168.149.131:26381

然而 Redis 却自动完成了主从节点的分配工作

如果大家在多台服务器部署过 Redis 集群的话,比如说在三台机器上部署三主三从的 redis 集群,你会观察到 Redis 自动地将主节点和从节点的部署位置错开

举个例子: master 1 和 slave 3 在同一台机器上; master 2和 slave 1 在同一台机器上; master 3 和 slave 2 在同一台机器上

这是为什么呢?

我们知道老版本的 Redis 集群管理命令是 redis-trib.rb,新版本则换成了 redis-cli

这两个可执行文件其实是一个用 C 编写的脚本,小伙伴们如果看过这两个文件的源码就会发现原因就在下面这段代码里

/* Return the anti-affinity score, which is a measure of the amount of
 * violations of anti-affinity in the current cluster layout, that is, how
 * badly the masters and slaves are distributed in the different IP
 * addresses so that slaves of the same master are not in the master
 * host and are also in different hosts.
 *
 * The score is calculated as follows:
 *
 * SAME_AS_MASTER = 10000 * each slave in the same IP of its master.
 * SAME_AS_SLAVE  = 1 * each slave having the same IP as another slave
                      of the same master.
 * FINAL_SCORE = SAME_AS_MASTER + SAME_AS_SLAVE
 *
 * So a greater score means a worse anti-affinity level, while zero
 * means perfect anti-affinity.
 *
 * The anti affinity optimization will try to get a score as low as
 * possible. Since we do not want to sacrifice the fact that slaves should
 * not be in the same host as the master, we assign 10000 times the score
 * to this violation, so that we'll optimize for the second factor only
 * if it does not impact the first one.
 *
 * The ipnodes argument is an array of clusterManagerNodeArray, one for
 * each IP, while ip_count is the total number of IPs in the configuration.
 *
 * The function returns the above score, and the list of
 * offending slaves can be stored into the 'offending' argument,
 * so that the optimizer can try changing the configuration of the
 * slaves violating the anti-affinity goals. */
static int clusterManagerGetAntiAffinityScore(clusterManagerNodeArray *ipnodes,
    int ip_count, clusterManagerNode ***offending, int *offending_len)
{
	...
    return score;
}

static void clusterManagerOptimizeAntiAffinity(clusterManagerNodeArray *ipnodes,
    int ip_count)
{
	...
}

通过注释我们可以得知,clusterManagerGetAntiAffinityScore 函数是用来计算反亲和性得分,这个得分表示了当前 Redis 集群布局中是否符合反亲和性的要求

反亲和性指的是 master 和 slave 不应该在同一台机器上,也不应该在相同的 IP 地址上

那如何计算反亲和性得分呢?

  • 如果有多个 slave 与同一个 master 在相同的 IP 地址上,那么对于每个这样的 slave,得分增加 10000
  • 如果有多个 slave 在相同的 IP 地址上,但它们彼此之间不是同一个 master,那么对于每个这样的 slave,得分增加 1
  • 最终得分是上述两部分得分之和

也就是说,得分越高,亲和性越高;得分越低,反亲和性越高;得分为零表示完全符合反亲和性的要求

获得得分之后,就会对得分高(反亲和性低)的节点进行优化

为了让 Redis 主从之间的反亲和性更高,clusterManagerOptimizeAntiAffinity 函数会对那些反亲和性很低的节点进行优化,它会尝试通过交换从节点的主节点,来改善集群中主从节点分布,从而减少反亲和性低问题

接下来我们分别来看下这两个函数

反亲和性得分计算

static int clusterManagerGetAntiAffinityScore(clusterManagerNodeArray *ipnodes,
    int ip_count, clusterManagerNode ***offending, int *offending_len)
{
	...
}

可以看到,该函数接受了四个参数:

  • ipnodes:一个包含多个 clusterManagerNodeArray 结构体的数组,每个结构体表示一个 IP 地址上的节点数组
  • ip_count:IP 地址的总数
  • offending:用于存储违反反亲和性规则的节点的指针数组(可选参数)
  • offending_len:存储 offending 数组中节点数量的指针(可选参数)

第一层 for 循环是遍历 ip 地址,第二层循环是遍历每个 IP 地址的节点数组

    ...
    for (i = 0; i < ip_count; i++) {
        clusterManagerNodeArray *node_array = &(ipnodes[i]);
        dict *related = dictCreate(&clusterManagerDictType);
        char *ip = NULL;
        for (j = 0; j < node_array->len; j++) {
			...
        }
    ...

我们来看下第二层 for 循环

    for (i = 0; i < ip_count; i++) {
        /* 获取每个 IP 地址的节点数组 */
        clusterManagerNodeArray *node_array = &(ipnodes[i]);
        /* 创建字典 related */
        dict *related = dictCreate(&clusterManagerDictType);
        char *ip = NULL;
        for (j = 0; j < node_array->len; j++) {
            /* 获取当前节点 */
            clusterManagerNode *node = node_array->nodes[j];
			...
            /* 在 related 字典中查找是否已经存在相应的键 */
            dictEntry *entry = dictFind(related, key);
            if (entry) types = sdsdup((sds) dictGetVal(entry));
            else types = sdsempty();
            if (node->replicate) types = sdscat(types, "s");
            else {
                sds s = sdscatsds(sdsnew("m"), types);
                sdsfree(types);
                types = s;
            }
            dictReplace(related, key, types);
        }

首先遍历每个 IP 地址的节点数组,对于每个 IP 地址上的节点数组,函数通过字典related来记录相同主节点和从节点的关系

其中字典 related的 key 是节点的名称,value 是一个字符串,表示该节点类型 types

对于每个节点,根据节点构建一个字符串类型的关系标记(types),将主节点标记为 m,从节点标记为 s

然后通过字典将相同关系标记的节点关联在一起,构建了一个记录相同主从节点关系的字典 related

	...     
		/* 创建字典迭代器,用于遍历节点分组信息 */
		dictIterator *iter = dictGetIterator(related);
        dictEntry *entry;
        while ((entry = dictNext(iter)) != NULL) {
            /* key 是节点名称,value 是 types,即节点类型 */
            sds types = (sds) dictGetVal(entry);
            sds name = (sds) dictGetKey(entry);
            int typeslen = sdslen(types);
            if (typeslen < 2) continue;
            /* 计算反亲和性得分 */
            if (types[0] == 'm') score += (10000 * (typeslen - 1));
            else score += (1 * typeslen);
            ...
        }

上面代码片段可知,while 循环遍历字典 related中的分组信息,计算相同主从节点关系的得分

  • 获取节点类型信息并长度
  • 如果是主节点类型,得分 += (10000 * (typeslen - 1));否则,得分 += (1 * typeslen)

如果有提供 offending 参数,将找到违反反亲和性规则的节点并存储到 offending 数组中,同时更新违反规则节点的数量,如下代码所示

            if (offending == NULL) continue;
            /* Populate the list of offending nodes. */
            listIter li;
            listNode *ln;
            listRewind(cluster_manager.nodes, &li);
            while ((ln = listNext(&li)) != NULL) {
                clusterManagerNode *n = ln->value;
                if (n->replicate == NULL) continue;
                if (!strcmp(n->replicate, name) && !strcmp(n->ip, ip)) {
                    *(offending_p++) = n;
                    if (offending_len != NULL) (*offending_len)++;
                    break;
                }
            }

最后返回得分 score,完整函数代码如下

static int clusterManagerGetAntiAffinityScore(clusterManagerNodeArray *ipnodes,
    int ip_count, clusterManagerNode ***offending, int *offending_len)
{
    int score = 0, i, j;
    int node_len = cluster_manager.nodes->len;
    clusterManagerNode **offending_p = NULL;
    if (offending != NULL) {
        *offending = zcalloc(node_len * sizeof(clusterManagerNode*));
        offending_p = *offending;
    }
    /* For each set of nodes in the same host, split by
     * related nodes (masters and slaves which are involved in
     * replication of each other) */
    for (i = 0; i < ip_count; i++) {
        clusterManagerNodeArray *node_array = &(ipnodes[i]);
        dict *related = dictCreate(&clusterManagerDictType);
        char *ip = NULL;
        for (j = 0; j < node_array->len; j++) {
            clusterManagerNode *node = node_array->nodes[j];
            if (node == NULL) continue;
            if (!ip) ip = node->ip;
            sds types;
            /* We always use the Master ID as key. */
            sds key = (!node->replicate ? node->name : node->replicate);
            assert(key != NULL);
            dictEntry *entry = dictFind(related, key);
            if (entry) types = sdsdup((sds) dictGetVal(entry));
            else types = sdsempty();
            /* Master type 'm' is always set as the first character of the
             * types string. */
            if (node->replicate) types = sdscat(types, "s");
            else {
                sds s = sdscatsds(sdsnew("m"), types);
                sdsfree(types);
                types = s;
            }
            dictReplace(related, key, types);
        }
        /* Now it's trivial to check, for each related group having the
         * same host, what is their local score. */
        dictIterator *iter = dictGetIterator(related);
        dictEntry *entry;
        while ((entry = dictNext(iter)) != NULL) {
            sds types = (sds) dictGetVal(entry);
            sds name = (sds) dictGetKey(entry);
            int typeslen = sdslen(types);
            if (typeslen < 2) continue;
            if (types[0] == 'm') score += (10000 * (typeslen - 1));
            else score += (1 * typeslen);
            if (offending == NULL) continue;
            /* Populate the list of offending nodes. */
            listIter li;
            listNode *ln;
            listRewind(cluster_manager.nodes, &li);
            while ((ln = listNext(&li)) != NULL) {
                clusterManagerNode *n = ln->value;
                if (n->replicate == NULL) continue;
                if (!strcmp(n->replicate, name) && !strcmp(n->ip, ip)) {
                    *(offending_p++) = n;
                    if (offending_len != NULL) (*offending_len)++;
                    break;
                }
            }
        }
        //if (offending_len != NULL) *offending_len = offending_p - *offending;
        dictReleaseIterator(iter);
        dictRelease(related);
    }
    return score;
}

反亲和性优化

计算出反亲和性得分之后,对于那些得分很低的节点,redis 就需要对其进行优化,提高集群中节点的分布,以避免节点在同一主机上

static void clusterManagerOptimizeAntiAffinity(clusterManagerNodeArray *ipnodes, int ip_count){	
    clusterManagerNode **offenders = NULL;
    int score = clusterManagerGetAntiAffinityScore(ipnodes, ip_count,
                                                   NULL, NULL);
    if (score == 0) goto cleanup;  
    ...
cleanup:
    zfree(offenders);
}

从上面的代码可以看到,如果得分为 0 ,说明反亲和性已经很好,无需优化。直接跳到 cleanup 去释放 offenders 节点的内存空间

如果得分不为 0 ,则就会设置一个最大迭代次数maxiter,这个次数跟节点的数量成正比,然后 while 循环在有限次迭代内进行优化操作

    ...
    int maxiter = 500 * node_len; // Effort is proportional to cluster size...
    while (maxiter > 0) {
    	...
        maxiter--;
    }
    ...

这个函数的核心就在 while 循环里,我们来看一下其中的一些片段

首先 offending_len 来存储违反规则的节点数,然后如果之前有违反规则的节点(offenders != NULL)则释放掉(zfree(offenders)

然后重新计算得分,如果得分为0或没有违反规则的节点,退出 while 循环

    int offending_len = 0;  
    if (offenders != NULL) {
        zfree(offenders);  // 释放之前存储的违反规则的节点
        offenders = NULL;
    }
    score = clusterManagerGetAntiAffinityScore(ipnodes,
                                               ip_count,
                                               &offenders,
                                               &offending_len);
    if (score == 0 || offending_len == 0) break; 

接着去随机选择一个违反规则的节点,尝试交换分配的 master

        int rand_idx = rand() % offending_len;
        clusterManagerNode *first = offenders[rand_idx],
                           *second = NULL;

		// 创建一个数组,用来存储其他可交换 master 的 slave
        clusterManagerNode **other_replicas = zcalloc((node_len - 1) *
                                                      sizeof(*other_replicas));

然后遍历集群中的节点,寻找能够交换 master 的 slave。如果没有找到,那就退出循环

    while ((ln = listNext(&li)) != NULL) {
        clusterManagerNode *n = ln->value;
        if (n != first && n->replicate != NULL)
            other_replicas[other_replicas_count++] = n;
    }
    
    if (other_replicas_count == 0) {
        zfree(other_replicas);
        break;
    }

如果找到了,就开始交换并计算交换后的反亲和性得分

    // 随机选择一个可交换的节点作为交换目标
    rand_idx = rand() % other_replicas_count;
    second = other_replicas[rand_idx];
    
    // 交换两个 slave 的 master 分配
    char *first_master = first->replicate,
         *second_master = second->replicate;
    first->replicate = second_master, first->dirty = 1;
    second->replicate = first_master, second->dirty = 1;
    
    // 计算交换后的反亲和性得分
    int new_score = clusterManagerGetAntiAffinityScore(ipnodes,
                                                       ip_count,
                                                       NULL, NULL);

如果交换后的得分比之前的得分还大,说明白交换了,还不如不交换,就会回顾;如果交换后的得分比之前的得分小,说明交换是没毛病的

    if (new_score > score) {
        first->replicate = first_master;
        second->replicate = second_master;
    }

最后释放资源,准备下一次 while 循环

    zfree(other_replicas);
    maxiter--;

总结一下:

  • 每次 while 循环会尝试随机选择一个违反反亲和性规则的从节点,并与另一个随机选中的从节点交换其主节点分配,然后重新计算交换后的反亲和性得分
  • 如果交换后的得分变大,说明交换不利于反亲和性,会回滚交换
  • 如果交换后得分变小,则保持,后面可能还需要多次交换
  • 这样,通过多次随机的交换尝试,最终可以达到更好的反亲和性分布

最后则是一些收尾工作,像输出日志信息,释放内存等,这里不过多介绍

    char *msg;
    int perfect = (score == 0);
    int log_level = (perfect ? CLUSTER_MANAGER_LOG_LVL_SUCCESS :
                               CLUSTER_MANAGER_LOG_LVL_WARN);
    if (perfect) msg = "[OK] Perfect anti-affinity obtained!";
    else if (score >= 10000)
        msg = ("[WARNING] Some slaves are in the same host as their master");
    else
        msg=("[WARNING] Some slaves of the same master are in the same host");
    clusterManagerLog(log_level, "%s\n", msg);

下面是完整代码

static void clusterManagerOptimizeAntiAffinity(clusterManagerNodeArray *ipnodes,
    int ip_count)
{
    clusterManagerNode **offenders = NULL;
    int score = clusterManagerGetAntiAffinityScore(ipnodes, ip_count,
                                                   NULL, NULL);
    if (score == 0) goto cleanup;
    clusterManagerLogInfo(">>> Trying to optimize slaves allocation "
                          "for anti-affinity\n");
    int node_len = cluster_manager.nodes->len;
    int maxiter = 500 * node_len; // Effort is proportional to cluster size...
    srand(time(NULL));
    while (maxiter > 0) {
        int offending_len = 0;
        if (offenders != NULL) {
            zfree(offenders);
            offenders = NULL;
        }
        score = clusterManagerGetAntiAffinityScore(ipnodes,
                                                   ip_count,
                                                   &offenders,
                                                   &offending_len);
        if (score == 0 || offending_len == 0) break; // Optimal anti affinity reached
        /* We'll try to randomly swap a slave's assigned master causing
         * an affinity problem with another random slave, to see if we
         * can improve the affinity. */
        int rand_idx = rand() % offending_len;
        clusterManagerNode *first = offenders[rand_idx],
                           *second = NULL;
        clusterManagerNode **other_replicas = zcalloc((node_len - 1) *
                                                      sizeof(*other_replicas));
        int other_replicas_count = 0;
        listIter li;
        listNode *ln;
        listRewind(cluster_manager.nodes, &li);
        while ((ln = listNext(&li)) != NULL) {
            clusterManagerNode *n = ln->value;
            if (n != first && n->replicate != NULL)
                other_replicas[other_replicas_count++] = n;
        }
        if (other_replicas_count == 0) {
            zfree(other_replicas);
            break;
        }
        rand_idx = rand() % other_replicas_count;
        second = other_replicas[rand_idx];
        char *first_master = first->replicate,
             *second_master = second->replicate;
        first->replicate = second_master, first->dirty = 1;
        second->replicate = first_master, second->dirty = 1;
        int new_score = clusterManagerGetAntiAffinityScore(ipnodes,
                                                           ip_count,
                                                           NULL, NULL);
        /* If the change actually makes thing worse, revert. Otherwise
         * leave as it is because the best solution may need a few
         * combined swaps. */
        if (new_score > score) {
            first->replicate = first_master;
            second->replicate = second_master;
        }
        zfree(other_replicas);
        maxiter--;
    }
    score = clusterManagerGetAntiAffinityScore(ipnodes, ip_count, NULL, NULL);
    char *msg;
    int perfect = (score == 0);
    int log_level = (perfect ? CLUSTER_MANAGER_LOG_LVL_SUCCESS :
                               CLUSTER_MANAGER_LOG_LVL_WARN);
    if (perfect) msg = "[OK] Perfect anti-affinity obtained!";
    else if (score >= 10000)
        msg = ("[WARNING] Some slaves are in the same host as their master");
    else
        msg=("[WARNING] Some slaves of the same master are in the same host");
    clusterManagerLog(log_level, "%s\n", msg);
cleanup:
    zfree(offenders);
}

与为什么创建 Redis 集群时会自动错开主从节点?相似的内容:

为什么创建 Redis 集群时会自动错开主从节点?

哈喽大家好,我是咸鱼 在《[一台服务器上部署 Redis 伪集群》](https://mp.weixin.qq.com/s?__biz=MzkzNzI1MzE2Mw==&mid=2247486439&idx=1&sn=0b10317397ef3259dd98d493915dd706&chksm=c2

一台服务器上部署 Redis 伪集群

哈喽大家好,我是咸鱼 今天这篇文章介绍如何在一台服务器(以 CentOS 7.9 为例)上通过 `redis-trib.rb` 工具搭建 Redis cluster (三主三从) `redis-trib.rb` 是一个基于 Ruby 编写的脚本,其功能涵盖了创建、管理以及维护 Redis 集群的各个

[转帖]为什么redis的SDS的最大长度限制为512mb?

当客户端操作 client 时,一般不会直接使用 sds ,而是通过对象的方式来使用。比如创建的字符串其实是一个对象,间接使用到了 sds 结构。限制 512M 的逻辑在 t_string.c 的 checkStringLength 方法。 在redis3.2.13、redis4.0.14、redi

[转帖]Redis客户端Jedis、Lettuce、Redisson

https://www.jianshu.com/p/90a9e2eccd73 在SpringBoot2.x之后,原来使用的jedis被替换为了lettuce Jedis:采用的直连,BIO网络模型 Jedis有一个问题:多个线程使用一个连接的时候线程不安全。 解决思路是: 使用连接池,为每个请求创建

claude3国内API接口对接

众所周知,由于地理位置原因,Claude3不对国内开放,而国内的镜像网站使用又贵的离谱! 因此,团队萌生了一个想法:为什么不创建一个一站式的平台,让用户能够通过单一的接口与多个模型交流呢?这样,用户就可以轻松地比较不同模型的表现,并根据需要选择最合适的一个。于是诞生了这个ChatGPT,Claude

C静态库的创建与使用--为什么要引入静态库?

C源程序需要经过预处理、编译、汇编几个阶段,得到各自源文件对应的可重定位目标文件,可重定位目标文件就是各个源文件的二进制机器代码,一般是.o格式。比如:util1.c、util2.c及main.c三个C源文件,经过预处理器、编译器、汇编器的处理,就可以得到各自的目标文件util1.o,util2.o

如何创建一个线程池,为什么不推荐使用Executors去创建呢?

我们在学线程的时候了解了几种创建线程的方式,比如继承Thread类,实现Runnable接口、Callable接口等,那对于线程池的使用,也需要去创建它,在这里我们提供2种构造线程池的方法: 方法一: 通过ThreadPoolExecutor构造函数来创建(首选) 这是JDK中最核心的线程池工具类,

MASA MinimalAPI源码解析:为什么我们只写了一个app.MapGet,却生成了三个接口

源码解析:为什么我们只写了一个app.MapGet,却生成了三个接口 1.ServiceBase 1.AutoMapRoute 源码如下: AutoMapRoute自动创建map路由,MinimalAPI会根据service中的方法,创建对应的api接口。 比如上文的一个方法: public asy

Pod安全策略:PodSecurityPolicy(PSP)

Pod安全策略:PodSecurityPolicy(PSP),PodSecurityPolicy 简介,为什么需要 PodSecurityPolicy,启用PodSecurityPolicy(PSP),PSP规则之禁止创建特权用户pod,PSP规则之限定数据卷使用某一个目录,PSP规则之限定数据卷的...

初探webAssembly

本文从为什么需要WebAssembly、WebAssembly的工作原理、哪些语言可用来创建WebAssembly模块、WebAssembly可以用在哪里 以及 怎么使用 几方面简要介绍了webAssembly。如果之前没有了解过webAssembly,可以做一些简要的了解。