redis-bgsave故障分析

前几天redis有了问题,发现可能是bgsave搞的,kill半天没有反应,直接kill -9了

1
2
3
4
5
6
7

[18862] xx Xxx 11:54:26.096 * 1 changes in 1800 seconds. Saving...
[18862] xx Xxx 11:54:26.533 * Background saving started by pid 15785
[18862] xx Xxx 11:57:41.308 # Background saving terminated by signal 9
[18862] xx Xxx 11:57:43.185 * 1 changes in 1800 seconds. Saving...
[18862] xx Xxx 11:57:43.621 * Background saving started by pid 16600
[18862] xx Xxx 11:58:56.646 # Background saving terminated by signal 9

可kill之后发现redis变成只读的了,不能写入了:

1
2
3

redis1:6379> set 1 1
(error) MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

查了代码里在bgsaveerr的时候会出现这个:

1
2
3

shared.bgsaveerr = createObject(REDIS_STRING,sdsnew(
"-MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.\r\n"));

而当bgsave不成功的时候就会将bgsave状态设置为error:

1
2
3
4
5
6
7
8
9
10
11

void backgroundSaveDoneHandlerDisk(int exitcode, int bysignal) {
if (!bysignal && exitcode == 0) {
redisLog(REDIS_NOTICE,
"Background saving terminated with success");
server.dirty = server.dirty - server.dirty_before_bgsave;
server.lastsave = time(NULL);
server.lastbgsave_status = REDIS_OK;
} else if (!bysignal && exitcode != 0) {
redisLog(REDIS_WARNING, "Background saving error");
server.lastbgsave_status = REDIS_ERR;

控制w写入的时候先去检查rdb_last_bgsave_status这个在redis.c里的processCommand里做了定义,就是先会去检查这个状态,个人觉得可以去了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

/* Don't accept write commands if there are problems persisting on disk
* and if this is a master instance. */
if (((server.stop_writes_on_bgsave_err &&
server.saveparamslen > 0 &&
server.lastbgsave_status == REDIS_ERR) ||
server.aof_last_write_status == REDIS_ERR) &&
server.masterhost == NULL &&
(c->cmd->flags & REDIS_CMD_WRITE ||
c->cmd->proc == pingCommand))
{
flagTransaction(c);
if (server.aof_last_write_status == REDIS_OK)
addReply(c, shared.bgsaveerr);
else
addReplySds(c,
sdscatprintf(sdsempty(),
"-MISCONF Errors writing to the AOF file: %s\r\n",
strerror(server.aof_last_write_errno)));
return REDIS_OK;
}

快速解决的方法就是把状态设置为off,然后把bgsave关闭

1
2
config set stop-writes-on-bgsave-error no
config set bgsave 0 0