VLF Group compute cluster – GlusterFS installation

Gluster

Download the GlusterFS tarball. To build, the easiest way is to use the rpmbuild specs they’ve provided.

On Infiniband machines:


  rpmbuild  -ta glusterfs-3.2.7.tar.gz

On non-Infiniband machines:


  rpmbuild  -ta glusterfs-3.2.7.tar.gz --without rdma

This will create some RPM files. Move them out of ~/rpmbuild/RPMS/x86_64/ into some convenient location.

Gluster

Download the GlusterFS tarball. To build, the easiest way is to use the rpmbuild specs they’ve provided.

On Infiniband machines:


  rpmbuild  -ta glusterfs-3.2.7.tar.gz

On non-Infiniband machines:


  rpmbuild  -ta glusterfs-3.2.7.tar.gz --without rdma

This will create some RPM files. Move them out of ~/rpmbuild/RPMS/x86_64/ into some convenient location.

In initial testing with gluster 3.1.2, RDMA transport (a fast, infiniband-based transport) proved to be buggy, so all nodes use the ip interface. On infiniband-only nodes, this means that IP-over-infiniband is used to handle communication. This option is selected using the transport-type socket option. This more than halves the transport speed but this seems to be necessary. New tests should be conducted with newer revisions of Gluster to determine whether this is still the case.

The server is configured as follows (/etc/glusterfs/glusterd.vol):


### Export volume "shared" with the contents of "/sharedlocal" directory.
volume shared
  type storage/posix                   # POSIX FS translator
  option directory /sharedlocal # Export this directory
end-volume

### Add network serving capability to above shared.
volume server
  type protocol/server
  option transport-type socket

  option transport.rdma.bind-address 192.168.2.1
  option transport.socket.bind-address 192.168.2.1
  option transport.tcp.bind-address 192.168.2.1
  option transport.rdma.work-request-send-size  65536
  option transport.rdma.work-request-send-count 100
  option transport.rdma.work-request-recv-size  65536
  option transport.rdma.work-request-recv-count 100
  option transport.rdma.mtu 2048

  subvolumes shared
  option auth.addr.shared.allow * # Allow access to "shared" volume
end-volume

Each client is configured as follows (/etc/glusterfs/glusterfs.vol):


### Add client feature and attach to remote subvolume
volume client
  type protocol/client
  option transport-type socket
  option remote-host 192.168.2.1         # IP address of the remote brick
  option remote-subvolume shared        # name of the remote volume
  option ping-timeout 600
end-volume

Note that ping-timeout 600 is a very, very good idea. High load can cause a brick to stop responding for longer than the default value (which was some very low value). If ping-timeout is reached, then individual file accesses can fail in a very annoying way.

The filesystem is mountend in the usual manner, with the following line in /etc/fstab:


/etc/glusterfs/glusterfs.vol  /shared  glusterfs  defaults 0  0