使用glusterfs来实现文件服务器高可用

发表于 2011-08-24 更新于 2024-04-10 分类于樱桃沟阅读次数：本文字数： 10k 阅读时长 ≈ 9 分钟

由于项目需要，需要对文件服务器实现高可用。之前想过一部份方案是nfs+rsync+inotify+keepalived 这样的方式。但是有很多问题，很多服务器mount之后一旦主的nfs挂掉之后，需要在client服务器上重新进行mount这样的操作。于是考虑到使用分布式文件系统，这个网上有很多文章来进行比较的。我用这个主要是它的部署方式灵活，没有单点问题，可以mount（通过fuse）下面就是具体的部署过程，我的操作系统是centos5.6 x86_64 client的IP为:192.168.0.201 server的IP为： 192.168.0.202,192.168.0.203 server端共享的文件夹为 /home/filecluster client端的文件夹为 /home/filecluster 修改3个机器的hosts文件

1
2
3

192.168.0.201 xen1 
192.168.0.202 xen2 
192.168.0.203 xen3

首先是需要下载fuse和glusterfs,以及python-ctypes

1
2
3

wget http://download.gluster.com/pub/gluster/glusterfs/3.2/LATEST/glusterfs-3.2.2.tar.gz 
wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.5/fuse-2.8.5.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Ffuse%2Ffiles%2Ffuse-2.X%2F2.8.5%2F&ts=1313661051&use_mirror=cdnetworks-kr-2 
wget http://download.fedora.redhat.com/pub/epel/5/x86_64/python-ctypes-1.0.2-2.el5.x86_64.rpm

安装python-ctypes和fuse以及glusterfs

1
2
3

rpm -ivh python-ctypes-1.0.2-2.el5.x86_64.rpm 
tar zxvf fuse-2.8.5.tar.gz && cd fuse* && ./configure && make && make install 
tar zxvf glusterfs-3.2.2.tar.gz && cd glusterfs* && ./configure --enable-fusermount && make && make install

安装完成后会自动生成/etc/init.d/glusterd 将fuse模块放入到开机自动启动中

1
2
3

echo "modprobe fuse" > /etc/sysconfig/modules/fuse.modules 
chmod 755 /etc/sysconfig/modules/fuse.modules 
modprobe fuse

把glusterd加入到开机启动项中

1	chkconfig glusterd on

修改/etc/init.d/glusterd 文件 server端

#!/bin/bash
#
# chkconfig: 35 90 12
# description: Gluster File System service for volume management
#

# Get function from functions library
. /etc/rc.d/init.d/functions

BASE=glusterd
GLUSTERFSD=glusterfsd
GLUSTERFS=glusterfs
GLUSTERD_BIN=/usr/local/sbin/$BASE
GLUSTERD_OPTS="-l /var/log/glusterfs.log -f /usr/local/etc/glusterfs/glusterfsd.vol"
GLUSTERD="$GLUSTERD_BIN $GLUSTERD_OPTS"
RETVAL=0

# Start the service $BASE
start()
{
       echo -n $"Starting $BASE:"
       daemon $GLUSTERD
       RETVAL=$?
       echo
       [ $RETVAL -ne 0 ] && exit $RETVAL
}

# Stop the service $BASE
stop()
{
       echo -n $"Stopping $BASE:"
       killproc $BASE
       echo
       pidof -c -o %PPID -x $GLUSTERFSD &> /dev/null
       [ $? -eq 0 ] &&  killproc $GLUSTERFSD &> /dev/null

       #pidof -c -o %PPID -x $GLUSTERFS &> /dev/null
       #[ $? -eq 0 ] &&  killproc $GLUSTERFS &> /dev/null

       if [ -f /etc/glusterd/nfs/run/nfs.pid ] ;then
       pid=`cat /etc/glusterd/nfs/run/nfs.pid`;
       cmd=`ps -p $pid -o comm=`

       if [ $cmd == "glusterfs" ]; then
       kill `cat /etc/glusterd/nfs/run/nfs.pid`
       fi
       fi
}

### service arguments ###
case $1 in
 start)
       start
       ;;
 stop)
       stop
       ;;
 status)
       status $BASE
       ;;
 restart)
       $0 stop
       $0 start
       ;;
 *)
       echo $"Usage: $0 {start|stop|status|restart}."
       exit 1
esac

exit 0

client端

#!/bin/bash
#
# chkconfig: 35 90 12
# description: Gluster File System service for volume management
#

# Get function from functions library
. /etc/rc.d/init.d/functions

BASE=glusterd
GLUSTERFSD=glusterfsd
GLUSTERFS=glusterfs
GLUSTERD_BIN=/usr/local/sbin/$GLUSTERFS
GLUSTERD_OPTS="-l /var/log/glusterfs.log -f /usr/local/etc/glusterfs/glusterfs.vol /home/filecluster"
GLUSTERD="$GLUSTERD_BIN $GLUSTERD_OPTS"
RETVAL=0

# Start the service $BASE
start()
{
       echo -n $"Starting $GLUSTERFS:"
       daemon $GLUSTERD
       RETVAL=$?
       echo
       [ $RETVAL -ne 0 ] && exit $RETVAL
}

# Stop the service $BASE
stop()
{
       echo -n $"Stopping $GLUSTERFS:"
       killproc $GLUSTERFS
       echo
       pidof -c -o %PPID -x $GLUSTERFSD &> /dev/null
       [ $? -eq 0 ] &&  killproc $GLUSTERFSD &> /dev/null

       #pidof -c -o %PPID -x $GLUSTERFS &> /dev/null
       #[ $? -eq 0 ] &&  killproc $GLUSTERFS &> /dev/null

       if [ -f /etc/glusterd/nfs/run/nfs.pid ] ;then
       pid=`cat /etc/glusterd/nfs/run/nfs.pid`;
       cmd=`ps -p $pid -o comm=`

       if [ $cmd == "glusterfs" ]; then
       kill `cat /etc/glusterd/nfs/run/nfs.pid`
       fi
       fi
}

### service arguments ###
case $1 in
 start)
       start
       ;;
 stop)
       stop
       ;;
 status)
       status $BASE
       ;;
 restart)
       $0 stop
       $0 start
       ;;
 *)
       echo $"Usage: $0 {start|stop|status|restart}."
       exit 1
esac

exit 0

修改server端的/usr/local/etc/glusterfs/glusterfsd.vol，这个配置就是在option bind-address 部分2台server会有所不同，其它全部一致

### file: server-volume.vol.sample

#####################################
###  GlusterFS Server Volume File  ##
#####################################

#### CONFIG FILE RULES:
### "#" is comment character.
### - Config file is case sensitive
### - Options within a volume block can be in any order.
### - Spaces or tabs are used as delimitter within a line.
### - Multiple values to options will be : delimitted.
### - Each option should end within a line.
### - Missing or commented fields will assume default values.
### - Blank/commented lines are allowed.
### - Sub-volumes should already be defined above before referring.

### Export volume "brick" with the contents of "/home/export" directory.
volume brick
  type storage/posix                   # POSIX FS translator
  option directory /home/filecluster        # Export this directory
end-volume
volume locker
  type features/posix-locks
  subvolumes brick
end-volume

### Add network serving capability to above brick.
volume server
  type protocol/server
  option transport-type tcp/server
# option transport-type unix
# option transport-type ib-sdp
 option bind-address 192.168.0.202     # Default is to listen on all interfaces # xen3上修改成192.168.0.203
# option listen-port 9999

# option transport-type ib-verbs
# option transport.ib-verbs.bind-address 192.168.1.10     # Default is to listen on all interfaces
# option transport.ib-verbs.listen-port 24016
# option transport.ib-verbs.work-request-send-size  131072
# option transport.ib-verbs.work-request-send-count 64
# option transport.ib-verbs.work-request-recv-size  131072
# option transport.ib-verbs.work-request-recv-count 64

# option client-volume-filename /etc/glusterfs/glusterfs-client.vol
  subvolumes brick
# NOTE: Access to any volume through protocol/server is denied by
# default. You need to explicitly grant access through # "auth"
# option.
  option auth.addr.brick.allow 192.168.0.* # Allow access to "brick" volume
  option auth.addr.locker.allow 192.168.0.* # Allow access to "locker" volume
end-volume

修改client端的/usr/local/etc/glusterfs/glusterfs.vol文件

### Add client feature and attach to remote subvolume
volume xen2
  type protocol/client
  option transport-type tcp/client
  option remote-host xen2
  option remote-port 24007
  option remote-subvolume locker       #name of the remote volume
end-volume

volume xen3
  type protocol/client
  option transport-type tcp/client
  option remote-host xen3
  option remote-port 24007
  option remote-subvolume locker
end-volume

#volume replicate2
#  type cluster/replicate
#  subvolumes xen2
#end-volume
#
#volume replicate3
#  type cluster/replicate
#  subvolumes xen3
#end-volume
#
volume bricks
  type cluster/replicate
  subvolumes xen2 xen3
#  subvolumes replicate1
end-volume
#
#volume writebehind
#  type performance/write-behind
#  option cache-size 1MB
#  subvolumes distribute
#end-volume
#
#volume cache
#  type performance/io-cache
#  option cache-size 64MB
#  subvolumes writebehind
#end-volume

最后就是启动server端的glusterd程序

1	/etc/init.d/glusterd start

然后启动client端的glusterd程序

1	/etc/init.d/glusterd start

这样你在client端用df就能看到如下这样的显示

[root@xen1 filecluster]# df -h
文件系统              容量  已用 可用 已用% 挂载点
/dev/sda1              29G  3.5G   24G  13% /
tmpfs                 512M     0  512M   0% /dev/shm
glusterfs#/usr/local/etc/glusterfs/glusterfs.vol
                       29G  3.3G   25G  12% /home/filecluster

然后我做了一个简单的跟NFS的对比测试

1. NFSv4
      1. dd if=/dev/zero of=xen.img bs=1M count=500
         524288000 bytes (524 MB) copied, 13.9683 seconds, 37.5 MB/s
      2. dd if=/dev/zero of=xen.img bs=1M count=32
         33554432 bytes (34 MB) copied, 0.710816 seconds, 47.2 MB/s
2. gluster
      1. dd if=/dev/zero of=xen.img bs=1M count=500
         524288000 bytes (524 MB) copied, 18.4192 seconds, 28.5 MB/s
      2. dd if=/dev/zero of=xen.img bs=1M count=32
         33554432 bytes (34 MB) copied, 0.591001 seconds, 56.8 MB/s

当然server端可以有2个以上的服务器来充当，但是由于我这里使用的是replication，所以用再多的服务器是比较浪费的，因为replication的模式下所有服务器的容量大小都是相同的。 gluster共提供以下几种模式

1. Distributed Volumes
2. Replicated Volumes
3. Striped Volumes
4. Distributed Striped Volumes
5. Distributed Replicated Volumes

一个系统上线必须要做一些故障测试。这里主要是对于服务器端的故障测试，因为到时候所有数据都是存储在服务器端的。我们就模拟在client写入的时候服务器端挂掉的情况先是在客户端上执行一个大文件写入，这个时候我们关闭xen2上的glusterd程序

1	dd if=/dev/zero of=xen1.img bs=1M count=500

当客户端写入完成后我们看3台服务器上的xen1.img文件的大小

[root@xen1 filecluster]# ll
总计 512508
-rw-r--r-- 1 root root 524288000 08-24 15:33 xen1.img

[root@xen2 filecluster]# ll
总计 241652
-rw-r--r-- 1 root root 247201792 08-24 15:32 xen1.img

[root@xen3 filecluster]# ll
总计 512508
-rw-r--r-- 1 root root 524288000 08-24 15:33 xen1.img

我们可以看到xen2上的数据大小是不对的。然后我们启动xen2上的glusterd程序当启动完成后我们看到的结果还是跟前面一样的。但是我们可以在client端执行

1	[root@xen1 filecluster]# ll

这样xen2上数据大小也正确了但是我们不知道在有目录的情况这个方法是否还是可以。在client端执行

1	mkdir file cd file dd if=/dev/zero of=xen1.img bs=1M count=500

其它条件还是跟之前一样，发现如果是在/home/filecluster目录下执行ls的话，xen2上数据还是不正确的。所以我们一旦发现某个server挂了之后必须执行如下命令才能完全同步

1	[root@xen1 filecluster]# find ./ -name "*" \| xargs ls