使用glusterfs来实现文件服务器高可用

由于项目需要,需要对文件服务器实现高可用。之前想过一部份方案是nfs+rsync+inotify+keepalived 这样的方式。但是有很多问题,很多服务器mount之后一旦主的nfs挂掉之后,需要在client服务器上重新进行mount这样的操作。 于是考虑到使用分布式文件系统,这个网上有很多文章来进行比较的。我用这个主要是它的部署方式灵活,没有单点问题,可以mount(通过fuse) 下面就是具体的部署过程,我的操作系统是centos5.6 x86_64 client的IP为:192.168.0.201 server的IP为: 192.168.0.202,192.168.0.203 server端共享的文件夹为 /home/filecluster client端的文件夹为 /home/filecluster 修改3个机器的hosts文件

1
2
3
192.168.0.201 xen1 
192.168.0.202 xen2
192.168.0.203 xen3

首先是需要下载fuse和glusterfs,以及python-ctypes

1
2
3
wget http://download.gluster.com/pub/gluster/glusterfs/3.2/LATEST/glusterfs-3.2.2.tar.gz 
wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.5/fuse-2.8.5.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Ffuse%2Ffiles%2Ffuse-2.X%2F2.8.5%2F&ts=1313661051&use_mirror=cdnetworks-kr-2
wget http://download.fedora.redhat.com/pub/epel/5/x86_64/python-ctypes-1.0.2-2.el5.x86_64.rpm

安装python-ctypes和fuse以及glusterfs

1
2
3
rpm -ivh python-ctypes-1.0.2-2.el5.x86_64.rpm 
tar zxvf fuse-2.8.5.tar.gz && cd fuse* && ./configure && make && make install
tar zxvf glusterfs-3.2.2.tar.gz && cd glusterfs* && ./configure --enable-fusermount && make && make install

安装完成后会自动生成/etc/init.d/glusterd 将fuse模块放入到开机自动启动中

1
2
3
echo "modprobe fuse" > /etc/sysconfig/modules/fuse.modules 
chmod 755 /etc/sysconfig/modules/fuse.modules
modprobe fuse

把glusterd加入到开机启动项中

1
chkconfig glusterd on 

修改/etc/init.d/glusterd 文件 server端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#!/bin/bash
#
# chkconfig: 35 90 12
# description: Gluster File System service for volume management
#

# Get function from functions library
. /etc/rc.d/init.d/functions

BASE=glusterd
GLUSTERFSD=glusterfsd
GLUSTERFS=glusterfs
GLUSTERD_BIN=/usr/local/sbin/$BASE
GLUSTERD_OPTS="-l /var/log/glusterfs.log -f /usr/local/etc/glusterfs/glusterfsd.vol"
GLUSTERD="$GLUSTERD_BIN $GLUSTERD_OPTS"
RETVAL=0

# Start the service $BASE
start()
{
echo -n $"Starting $BASE:"
daemon $GLUSTERD
RETVAL=$?
echo
[ $RETVAL -ne 0 ] && exit $RETVAL
}

# Stop the service $BASE
stop()
{
echo -n $"Stopping $BASE:"
killproc $BASE
echo
pidof -c -o %PPID -x $GLUSTERFSD &> /dev/null
[ $? -eq 0 ] && killproc $GLUSTERFSD &> /dev/null

#pidof -c -o %PPID -x $GLUSTERFS &> /dev/null
#[ $? -eq 0 ] && killproc $GLUSTERFS &> /dev/null

if [ -f /etc/glusterd/nfs/run/nfs.pid ] ;then
pid=`cat /etc/glusterd/nfs/run/nfs.pid`;
cmd=`ps -p $pid -o comm=`

if [ $cmd == "glusterfs" ]; then
kill `cat /etc/glusterd/nfs/run/nfs.pid`
fi
fi
}

### service arguments ###
case $1 in
start)
start
;;
stop)
stop
;;
status)
status $BASE
;;
restart)
$0 stop
$0 start
;;
*)
echo $"Usage: $0 {start|stop|status|restart}."
exit 1
esac

exit 0

client端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#!/bin/bash
#
# chkconfig: 35 90 12
# description: Gluster File System service for volume management
#

# Get function from functions library
. /etc/rc.d/init.d/functions

BASE=glusterd
GLUSTERFSD=glusterfsd
GLUSTERFS=glusterfs
GLUSTERD_BIN=/usr/local/sbin/$GLUSTERFS
GLUSTERD_OPTS="-l /var/log/glusterfs.log -f /usr/local/etc/glusterfs/glusterfs.vol /home/filecluster"
GLUSTERD="$GLUSTERD_BIN $GLUSTERD_OPTS"
RETVAL=0

# Start the service $BASE
start()
{
echo -n $"Starting $GLUSTERFS:"
daemon $GLUSTERD
RETVAL=$?
echo
[ $RETVAL -ne 0 ] && exit $RETVAL
}

# Stop the service $BASE
stop()
{
echo -n $"Stopping $GLUSTERFS:"
killproc $GLUSTERFS
echo
pidof -c -o %PPID -x $GLUSTERFSD &> /dev/null
[ $? -eq 0 ] && killproc $GLUSTERFSD &> /dev/null

#pidof -c -o %PPID -x $GLUSTERFS &> /dev/null
#[ $? -eq 0 ] && killproc $GLUSTERFS &> /dev/null

if [ -f /etc/glusterd/nfs/run/nfs.pid ] ;then
pid=`cat /etc/glusterd/nfs/run/nfs.pid`;
cmd=`ps -p $pid -o comm=`

if [ $cmd == "glusterfs" ]; then
kill `cat /etc/glusterd/nfs/run/nfs.pid`
fi
fi
}

### service arguments ###
case $1 in
start)
start
;;
stop)
stop
;;
status)
status $BASE
;;
restart)
$0 stop
$0 start
;;
*)
echo $"Usage: $0 {start|stop|status|restart}."
exit 1
esac

exit 0

修改server端的/usr/local/etc/glusterfs/glusterfsd.vol, 这个配置就是在option bind-address 部分2台server会有所不同,其它全部一致

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
### file: server-volume.vol.sample

#####################################
### GlusterFS Server Volume File ##
#####################################

#### CONFIG FILE RULES:
### "#" is comment character.
### - Config file is case sensitive
### - Options within a volume block can be in any order.
### - Spaces or tabs are used as delimitter within a line.
### - Multiple values to options will be : delimitted.
### - Each option should end within a line.
### - Missing or commented fields will assume default values.
### - Blank/commented lines are allowed.
### - Sub-volumes should already be defined above before referring.

### Export volume "brick" with the contents of "/home/export" directory.
volume brick
type storage/posix # POSIX FS translator
option directory /home/filecluster # Export this directory
end-volume
volume locker
type features/posix-locks
subvolumes brick
end-volume

### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp/server
# option transport-type unix
# option transport-type ib-sdp
option bind-address 192.168.0.202 # Default is to listen on all interfaces # xen3上修改成192.168.0.203
# option listen-port 9999

# option transport-type ib-verbs
# option transport.ib-verbs.bind-address 192.168.1.10 # Default is to listen on all interfaces
# option transport.ib-verbs.listen-port 24016
# option transport.ib-verbs.work-request-send-size 131072
# option transport.ib-verbs.work-request-send-count 64
# option transport.ib-verbs.work-request-recv-size 131072
# option transport.ib-verbs.work-request-recv-count 64

# option client-volume-filename /etc/glusterfs/glusterfs-client.vol
subvolumes brick
# NOTE: Access to any volume through protocol/server is denied by
# default. You need to explicitly grant access through # "auth"
# option.
option auth.addr.brick.allow 192.168.0.* # Allow access to "brick" volume
option auth.addr.locker.allow 192.168.0.* # Allow access to "locker" volume
end-volume

修改client端的/usr/local/etc/glusterfs/glusterfs.vol文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
### Add client feature and attach to remote subvolume
volume xen2
type protocol/client
option transport-type tcp/client
option remote-host xen2
option remote-port 24007
option remote-subvolume locker #name of the remote volume
end-volume

volume xen3
type protocol/client
option transport-type tcp/client
option remote-host xen3
option remote-port 24007
option remote-subvolume locker
end-volume

#volume replicate2
# type cluster/replicate
# subvolumes xen2
#end-volume
#
#volume replicate3
# type cluster/replicate
# subvolumes xen3
#end-volume
#
volume bricks
type cluster/replicate
subvolumes xen2 xen3
# subvolumes replicate1
end-volume
#
#volume writebehind
# type performance/write-behind
# option cache-size 1MB
# subvolumes distribute
#end-volume
#
#volume cache
# type performance/io-cache
# option cache-size 64MB
# subvolumes writebehind
#end-volume

最后就是启动server端的glusterd程序

1
/etc/init.d/glusterd start 

然后启动client端的glusterd程序

1
/etc/init.d/glusterd start 

这样你在client端用df就能看到如下这样的显示

1
2
3
4
5
6
7
[root@xen1 filecluster]# df -h
文件系统 容量 已用 可用 已用% 挂载点
/dev/sda1 29G 3.5G 24G 13% /
tmpfs 512M 0 512M 0% /dev/shm
glusterfs#/usr/local/etc/glusterfs/glusterfs.vol
29G 3.3G 25G 12% /home/filecluster

然后我做了一个简单的跟NFS的对比测试

1
2
3
4
5
6
7
8
9
10
1. NFSv4
1. dd if=/dev/zero of=xen.img bs=1M count=500
524288000 bytes (524 MB) copied, 13.9683 seconds, 37.5 MB/s
2. dd if=/dev/zero of=xen.img bs=1M count=32
33554432 bytes (34 MB) copied, 0.710816 seconds, 47.2 MB/s
2. gluster
1. dd if=/dev/zero of=xen.img bs=1M count=500
524288000 bytes (524 MB) copied, 18.4192 seconds, 28.5 MB/s
2. dd if=/dev/zero of=xen.img bs=1M count=32
33554432 bytes (34 MB) copied, 0.591001 seconds, 56.8 MB/s

当然server端可以有2个以上的服务器来充当,但是由于我这里使用的是replication,所以用再多的服务器是比较浪费的,因为replication的模式下所有服务器的容量大小都是相同的。 gluster共提供以下几种模式

1
2
3
4
5
1. Distributed Volumes
2. Replicated Volumes
3. Striped Volumes
4. Distributed Striped Volumes
5. Distributed Replicated Volumes

一个系统上线必须要做一些故障测试。这里主要是对于服务器端的故障测试,因为到时候所有数据都是存储在服务器端的。我们就模拟在client写入的时候服务器端挂掉的情况 先是在客户端上执行一个大文件写入,这个时候我们关闭xen2上的glusterd程序

1
dd if=/dev/zero of=xen1.img bs=1M count=500 

当客户端写入完成后我们看3台服务器上的xen1.img文件的大小

1
2
3
4
5
6
7
8
9
10
11
[root@xen1 filecluster]# ll
总计 512508
-rw-r--r-- 1 root root 524288000 08-24 15:33 xen1.img

[root@xen2 filecluster]# ll
总计 241652
-rw-r--r-- 1 root root 247201792 08-24 15:32 xen1.img

[root@xen3 filecluster]# ll
总计 512508
-rw-r--r-- 1 root root 524288000 08-24 15:33 xen1.img

我们可以看到xen2上的数据大小是不对的。然后我们启动xen2上的glusterd程序 当启动完成后我们看到的结果还是跟前面一样的。但是我们可以在client端执行

1
[root@xen1 filecluster]# ll 

这样xen2上数据大小也正确了 但是我们不知道在有目录的情况这个方法是否还是可以。在client端执行

1
mkdir file cd file dd if=/dev/zero of=xen1.img bs=1M count=500 

其它条件还是跟之前一样,发现如果是在/home/filecluster目录下执行ls的话,xen2上数据还是不正确的。 所以我们一旦发现某个server挂了之后必须执行如下命令才能完全同步

1
[root@xen1 filecluster]# find ./ -name "*" | xargs ls