[转帖]Redis 7.0 三节点哨兵（Sentinel）高可用环境搭建手册

1 哨兵高可用架构说明

Redis 最早的高可用方案是主从复制，但这种方案存在一个问题，就是当主库宕机后，从库不会自动切成主库，需要人工干预。所有在主从复制的基础上引入了哨兵模式高可用方案。

关于Redis 的主从架构说明，可以直接参考官方手册：

Redis replication
https://redis.io/docs/manual/replication/
High availability with Redis Sentinel
https://redis.io/docs/manual/sentinel/

哨兵模式是一种特殊的模式，哨兵是一个独立的进程，作为进程，它独立运行。其原理是哨兵通过发送命令，等待Redis服务器响应，从而监控运行的多个Redis实例。

Redis的哨兵(sentinel) 系统用于管理多个 Redis 服务器,哨兵执行以下三个任务:

监控(Monitoring): 哨兵(sentinel) 会不断地检查你的Master和Slave是否正常运行。
提醒(Notification):当被监控的某个Redis出现问题时, 可以通过 API 向管理员或者其他应用程序发送通知。
自动故障迁移(Automatic failover):当一个Master不能正常工作时，哨兵(sentinel) 会开始一次自动故障迁移操作,它会将失效Master的其中一个Slave升级为新的Master, 并让失效Master的其他Slave改为复制新的Master; 当客户端试图连接失效的Master时,集群也会向客户端返回新Master的地址,使得集群可以使用新的Master代替失效Master。

当一个哨兵进程对Redis服务器进行监控，可能会出现问题，为此可以使用哨兵进行监控，各个哨兵之间还会进行监控，这就形成了多哨兵模式。

2 搭建哨兵环境

2.1 安装单机Redis

我们这里在3台主机上进行测试：

[dave@www.cndba.cn_1 local]# cat /etc/hosts
127.0.0.1   localhost
172.31.185.120 mongodb1
172.31.185.165 mongodb2
172.31.185.131 mongodb3
[dave@www.cndba.cn_1 local]#

在三台主机上安装好单机Redis，具体操作参考之前的博客，如下：

Linux 7.8 平台 Redis 7 安装并配置开机自启动操作手册
https://www.cndba.cn/dave/article/108061

2.2 搭建主从复制

哨兵是基于主从复制进行的，所以在配置sentinel之前，需要先配置主从复制。这个很简单。

我们这这里规划：

172.31.185.120 mongodb1 ： Master
172.31.185.165 mongodb2 ： Slave
172.31.185.131 mongodb3 ： Slave

在2个Slave 节点执行如下命令：

replicaof 172.31.185.120 6379
config set masterauth redis   # 这里因为我们Master启用了密码，所以必须设置


[dave@www.cndba.cn_3 ~]# redis-cli
127.0.0.1:6379> auth redis
OK
127.0.0.1:6379> replicaof 172.31.185.120 6379
OK
127.0.0.1:6379> config get masterauth
1) "masterauth"
2) ""
127.0.0.1:6379> config set masterauth redis
OK
127.0.0.1:6379> config rewrite
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.31.185.120
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
……

[dave@www.cndba.cn_2 etc]# redis-cli
127.0.0.1:6379> auth redis
OK
127.0.0.1:6379> replicaof 172.31.185.120 6379
OK
127.0.0.1:6379> config set masterauth redis
OK
127.0.0.1:6379> config rewrite
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.31.185.120
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
……

主库查看：
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.31.185.131,port=6379,state=online,offset=252,lag=1
slave1:ip=172.31.185.165,port=6379,state=online,offset=252,lag=0
master_failover_state:no-failover
master_replid:e07445bc0cc0e180c25c3e006af5b765943b2f6e
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:252
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:252
127.0.0.1:6379>

2.3 配置sentinel

Sentinel的配置文件模板在Redis 的安装目录下，我们复制到/etc 目录下：

[dave@www.cndba.cn_1 redis-7.0.1]# pwd
/root/redis-7.0.1
[dave@www.cndba.cn_1 redis-7.0.1]# ll *.conf
-rw-rw-r-- 1 root root 106547 Jun  8 17:56 redis.conf
-rw-rw-r-- 1 root root  13924 Jun  8 17:56 sentinel.conf
[dave@www.cndba.cn_1 redis-7.0.1]#
[dave@www.cndba.cn_1 redis-7.0.1]# cp sentinel.conf /etc/

修改sentinel.conf文件中的如下参数：

###普通配置

port 26379
# 保护模式关闭，这样其他服务起就可以访问此台redis
protected-mode no
# 哨兵模式是否后台启动，默认no，改为yes
daemonize yes
pidfile /var/run/redis-sentinel.pid
# log日志保存位置
logfile /usr/local/redis/sentinel/redis-sentinel.log
# 工作目录
dir /usr/local/redis/sentinel

###核心配置
# 核心配置。
# 第三个参数：哨兵名字，可自行修改。（若修改了，那后面涉及到的都得同步） 
# 第四个参数：master主机ip地址，我们这里以mongodb1 为主节点
# 第五个参数：redis端口号
# 第六个参数：哨兵的数量。比如2表示，当至少有2个哨兵发现master的redis挂了，
#               那么就将此master标记为宕机节点。
#               这个时候就会进行故障的转移，将其中的一个从节点变为master
sentinel monitor mymaster 172.31.185.120 6379 2
# master中redis的密码
sentinel auth-pass mymaster redis
# 哨兵从master节点宕机后，等待多少时间（毫秒），认定master不可用。
# 默认30s，这里为了测试，改成10s
sentinel down-after-milliseconds mymaster 10000
# 当替换主节点后，剩余从节点重新和新master做同步的并行数量，默认为 1
sentinel parallel-syncs mymaster 1
# 主备切换的时间，若在3分钟内没有切换成功，换另一个从节点切换
sentinel failover-timeout mymaster 180000

2.4 在3个节点分别启动哨兵进程

[dave@www.cndba.cn_1 redis]# redis-sentinel /etc/sentinel.conf

[dave@www.cndba.cn_2 etc]# redis-sentinel /etc/sentinel.conf

[dave@www.cndba.cn_3 redis-7.0.1]# redis-sentinel /etc/sentinel.conf

[dave@www.cndba.cn_1 redis]# ps -ef|grep redis
root      5738     1  0 15:11 ?        00:00:00 redis-sentinel *:26379 [sentinel]
root      6081 32434  0 15:13 pts/0    00:00:00 grep --color=auto redis
root     27023     1  0 Jun16 ?        00:02:06 /usr/local/redis/bin/redis-server 0.0.0.0:6379
[dave@www.cndba.cn_1 redis]#

2.5 查看哨兵状态

Master 节点：

[dave@www.cndba.cn_1 redis]# redis-cli
127.0.0.1:6379> auth redis
OK
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:8815ccfaffbed8d45123a6f5e7426d146c497134
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379>

3 验证哨兵环境

查看Master 状态：

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.31.185.131,port=6379,state=online,offset=9825,lag=0
slave1:ip=172.31.185.165,port=6379,state=online,offset=9825,lag=1
master_failover_state:no-failover
master_replid:e07445bc0cc0e180c25c3e006af5b765943b2f6e
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9968
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:9968
127.0.0.1:6379>

Kill 掉主库：

[dave@www.cndba.cn_1 ~]# ps -ef|grep redis
root      9914 32434  0 15:39 pts/0    00:00:00 redis-cli
root     10348     1  0 15:41 ?        00:00:00 redis-sentinel *:26379 [sentinel]
root     10606 10286  0 15:43 pts/1    00:00:00 grep --color=auto redis
root     27023     1  0 Jun16 ?        00:05:15 /usr/local/redis/bin/redis-server 0.0.0.0:6379
[dave@www.cndba.cn_1 ~]# kill -9 27023

从库查看，并没有切换：

127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.31.185.120
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_read_repl_offset:29256
slave_repl_offset:29256
master_link_down_since_seconds:159
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:e07445bc0cc0e180c25c3e006af5b765943b2f6e
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:29256
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:29242
127.0.0.1:6379>

后来将哨兵的投票数改成1，切换成功：

sentinel monitor mymaster 172.31.185.120 6379 1

这里哨兵数量大于的情况下设置为:N/2-1, 3个以内设置为1。否则不会发生切换。

我这里kill 掉Master 主节点后，切换到了mongodb2节点：

[dave@www.cndba.cn_2 ~]# redis-cli
127.0.0.1:6379> auth redis
OK
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.31.185.131,port=6379,state=online,offset=87057,lag=0
master_failover_state:no-failover
master_replid:89e8552ec67362ea3c1a4d07244111757a42b259
master_replid2:afaf2753e116b73f2c9f5b02996d7583cf125ad4
master_repl_offset:87200
second_repl_offset:76526
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1311
repl_backlog_histlen:85890
127.0.0.1:6379>

重启原来的Master,可以看到原Master 也变成了Slave：

[dave@www.cndba.cn_1 redis]# systemctl start redis
[dave@www.cndba.cn_1 redis]# redis-cli
127.0.0.1:6379> auth redis
OK
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:172.31.185.165
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_read_repl_offset:162073
slave_repl_offset:162073
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:89e8552ec67362ea3c1a4d07244111757a42b259
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:162073
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:159320
repl_backlog_histlen:2754
127.0.0.1:6379>