====== Cluster in a Box ====== A VirtualBox Solaris Cluster complete with iSCSI quorum disks! ===== Host System ===== The host system is a speedy laptop: * Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz * 6Gb RAM * GeForce GTX 260M At the time of this article, I'm running Ubuntu 10.04 (lucid) with VirtualBox 3.2.8. The Solaris Cluster is 3.3 running in a Solaris 10u9 virtual machine. Here's an {{:prophead:s10u9sc33.tgz|export}} of the appliance if you want to use it. It has the networks defined so all you need to do is create a hard drive and load the software. VirtualBox will also require a //Host-only Network// which is configured under //File// -> //Preferences// -> //Network//. I called mine //vboxnet0// and configured it as such: ^ IPv4 Address|192.168.1.254 | ^ IPv4 Network Mask|255.255.255.0 | I don't use IPv6 so I just went with the defaults. ===== Shared Storage ===== A cluster will require some sort of shared storage. This can be accomplished in VirtualBox by either using iSCSI or fixed size virtual disks. ===== Fixed Size Virtual Disks ===== If you don't care to set up iSCSI, you can use shared virtual storage in later releases of VirtualBox: Create the disks. $ for f in 0 1 2 3 4 5 ;do > vboxmanage createhd --filename shared-$f.vdi --size 5000 --format VDI --variant Fixed > done Attach the disks to an existing VM. $ for f in 0 1 2 3 4 5 ;do VBoxManage storageattach node1 --storagectl "SCSI Controller" --port $f --device 0 --type hdd --medium shared-$f.vdi done Modify the disk type. $ for f in 0 1 2 3 4 5 ;do > vboxmanage modifyhd shared-$f.vdi --type shareable > done You may now attach these drives to multiple VMs and use them as shared storage. ===== iSCSI ===== Before the cluster software is installed into the virtual machines, iSCSI needs to be installed and configured on the linux host system. - **Create Disk Image Files**.\\ iSCSI will require a few disk image files and this can be done with dd. The following will create 6 1Gb disk image files: cd /some/place/with/plenty/of/storage for i in {0..5}; do dd if=/dev/zero of=lun$i.img bs=1024k count=1000; done - **Install iSCSI Packages** sudo apt-get install iscsitarget - **Configure iSCSI**\\ Add these entries to /etc/ietd.conf: Target iqn.2010-04.us.webcave.linux-host:storage Lun 0 Path=/some/place/with/plenty/of/storage/lun0.img,Type=fileio Lun 1 Path=/some/place/with/plenty/of/storage/lun1.img,Type=fileio Lun 2 Path=/some/place/with/plenty/of/storage/lun2.img,Type=fileio Lun 3 Path=/some/place/with/plenty/of/storage/lun3.img,Type=fileio Lun 4 Path=/some/place/with/plenty/of/storage/lun4.img,Type=fileio Lun 5 Path=/some/place/with/plenty/of/storage/lun5.img,Type=fileio \\ Edit /etc/default/iscsitarget ISCSITARGET_ENABLE=true -- **Start the Service**\\ /etc/init.d/iscsitarget start ===== Virtual Machines ===== - Create two Solaris 10 virtual machines using this appliance {{:prophead:s10u9sc33.tgz|export}}. You will need to attach a 15GB hard disk to each machine. Install Solaris 10u9 into each machine selecting at least the //End User// package cluster. I went with //All//. When finishing the install, make sure you select a default router pointing to the virtual network adapter of the host system. - Log into the virtual machines using the command line login option and change a few things before we get started: - Edit /etc/hosts on both systems and add an entry for the other cluster node(s). - Disable the graphical loginsvcadm disable cde-login - Set up a decent /.profile # # let's use a real shell # if [ "$0" = "-sh" -a "$SHELL" = "/sbin/sh" ]; then SHELL="/usr/bin/bash"; export SHELL exec $SHELL -l fi if [ "$SHELL" = "/usr/bin/bash" ]; then set -o vi export PAGER=less export PATH=/usr/sbin:/usr/bin:/usr/cluster/bin:/usr/cluster/lib/sc:/usr/ccs/bin:/usr/proc/bin:/usr/sfw/bin export MANPATH=/usr/share/man:/usr/cluster/man:/usr/sfw/man if [ "`tty`" != "/dev/console" ]; then PROMPT_COMMAND='echo -ne "\033]0;${LOGNAME}@${HOSTNAME}: ${PWD}\007"' fi export PS1="${LOGNAME}@${HOSTNAME} # " fi - Enable root logins by editing /etc/default/login and commenting the //CONSOLE// entry. vi /etc/default/login ... # If CONSOLE is set, root can only login on that device. # Comment this line out to allow remote login by root. # #CONSOLE=/dev/console ... - Enable root ssh logins by editing /etc/ssh/sshd_config and changing //PermitRootLogin// to yes: vi /etc/ssh/sshd_config ... # Are root logins permitted using sshd. # Note that sshd uses pam_authenticate(3PAM) so the root (or any other) user # maybe denied access by a PAM module regardless of this setting. # Valid options are yes, without-password, no. PermitRootLogin yes ... Notify sshd of change:pkill -HUP sshd - If you chose to use iSCSI, set up the iSCSI initiator in the VMs\\ iscsiadm add static-config iqn.2010-04.us.webcave.linux-host:storage,192.168.1.254:3260 iscsiadm modify discovery --static enable\\ This should add the iSCSI disks and you can verify this by looking at format. If the disks are now present in format, label the disks.\\ # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 /pci@0,0/pci-ide@1,1/ide@0/cmdk@0,0 1. c3t1d0 /iscsi/disk@0000iqn.2010-04.us.webcave.linux-host%3Astorage0001,0 2. c3t2d0 /iscsi/disk@0000iqn.2010-04.us.webcave.linux-host%3Astorage0001,1 3. c3t3d0 /iscsi/disk@0000iqn.2010-04.us.webcave.linux-host%3Astorage0001,2 4. c3t4d0 /iscsi/disk@0000iqn.2010-04.us.webcave.linux-host%3Astorage0001,3 5. c3t5d0 /iscsi/disk@0000iqn.2010-04.us.webcave.linux-host%3Astorage0001,4 6. c3t6d0 /iscsi/disk@0000iqn.2010-04.us.webcave.linux-host%3Astorage0001,5 Specify disk (enter its number): Specify disk (enter its number): ===== Cluster Install ===== Install the cluster software in the VMs but choose the option to //Configure Later// and reboot. Run //scinstall// on the first node: *** Main Menu *** Please select from one of the following (*) options: * 1) Create a new cluster or add a cluster node 2) Configure a cluster to be JumpStarted from this install server 3) Manage a dual-partition upgrade 4) Upgrade this cluster node * 5) Print release information for this cluster node * ?) Help with menu options * q) Quit Option: 1 *** New Cluster and Cluster Node Menu *** Please select from any one of the following options: 1) Create a new cluster 2) Create just the first node of a new cluster on this machine 3) Add this machine as a node in an existing cluster ?) Help with menu options q) Return to the Main Menu Option: 2 Continue on answering the configuration questions but disable //automatic quorum device selection//. Your install should look like this: >>> Confirmation <<< Your responses indicate the following options to scinstall: scinstall -i \ -C crash \ -F \ -G lofi \ -T node=node1,node=node2,authtype=sys \ -w netaddr=172.16.0.0,netmask=255.255.240.0,maxnodes=64,maxprivatenets=10,numvirtualclusters=12 \ -A trtype=dlpi,name=e1000g1 -A trtype=dlpi,name=e1000g2 \ -B type=switch,name=switch1 -B type=switch,name=switch2 \ -m endpoint=:e1000g1,endpoint=switch1 \ -m endpoint=:e1000g2,endpoint=switch2 Are these the options you want to use (yes/no) [yes]? Once the first node has rebooted, run //scinstall// on the second node: *** Main Menu *** Please select from one of the following (*) options: * 1) Create a new cluster or add a cluster node 2) Configure a cluster to be JumpStarted from this install server 3) Manage a dual-partition upgrade 4) Upgrade this cluster node * 5) Print release information for this cluster node * ?) Help with menu options * q) Quit Option: 1 *** New Cluster and Cluster Node Menu *** Please select from any one of the following options: 1) Create a new cluster 2) Create just the first node of a new cluster on this machine 3) Add this machine as a node in an existing cluster ?) Help with menu options q) Return to the Main Menu Option: 3 Answer the install questions and when finished, the scinstall command should look like this: >>> Confirmation <<< Your responses indicate the following options to scinstall: scinstall -i \ -C crash \ -N node1 \ -G lofi \ -A trtype=dlpi,name=e1000g1 -A trtype=dlpi,name=e1000g2 \ -m endpoint=:e1000g1,endpoint=switch1 \ -m endpoint=:e1000g2,endpoint=switch2 Are these the options you want to use (yes/no) [yes]? After all nodes have been installed and added to the cluster, the shared storage (either iSCSI or shared virtual disks) must be set for software fencing. Obtain a list of the shared disks so we know which //did// devices to change:\\ # cldev list -v DID Device Full Device Path ---------- ---------------- d1 node1:/dev/rdsk/c0d0 d2 node1:/dev/rdsk/c1t0d0 d3 node2:/dev/rdsk/c0d0 d4 node2:/dev/rdsk/c1t0d0 d5 node1:/dev/rdsk/c3t1d0 d5 node2:/dev/rdsk/c2t1d0 d6 node2:/dev/rdsk/c2t2d0 d6 node1:/dev/rdsk/c3t2d0 d7 node2:/dev/rdsk/c2t3d0 d7 node1:/dev/rdsk/c3t3d0 d8 node2:/dev/rdsk/c2t4d0 d8 node1:/dev/rdsk/c3t4d0 d9 node2:/dev/rdsk/c2t5d0 d9 node1:/dev/rdsk/c3t5d0 d10 node2:/dev/rdsk/c2t6d0 d10 node1:/dev/rdsk/c3t6d0 The shared disks are d5 through d10 so we'll change the fencing option for these devices to //nofencing//.\\ # for i in 5 6 7 8 9 10; do > cldev set -p default_fencing=nofencing d$i > done We can verify this setting: # for i in 5 6 7 8 9 10; do > cldev show d$i > done === DID Device Instances === DID Device Name: /dev/did/rdsk/d5 Full Device Path: node2:/dev/rdsk/c2t1d0 Full Device Path: node1:/dev/rdsk/c3t1d0 Replication: none default_fencing: nofencing === DID Device Instances === DID Device Name: /dev/did/rdsk/d6 Full Device Path: node1:/dev/rdsk/c3t2d0 Full Device Path: node2:/dev/rdsk/c2t2d0 Replication: none default_fencing: nofencing === DID Device Instances === DID Device Name: /dev/did/rdsk/d7 Full Device Path: node1:/dev/rdsk/c3t3d0 Full Device Path: node2:/dev/rdsk/c2t3d0 Replication: none default_fencing: nofencing === DID Device Instances === DID Device Name: /dev/did/rdsk/d8 Full Device Path: node1:/dev/rdsk/c3t4d0 Full Device Path: node2:/dev/rdsk/c2t4d0 Replication: none default_fencing: nofencing === DID Device Instances === DID Device Name: /dev/did/rdsk/d9 Full Device Path: node1:/dev/rdsk/c3t5d0 Full Device Path: node2:/dev/rdsk/c2t5d0 Replication: none default_fencing: nofencing === DID Device Instances === DID Device Name: /dev/did/rdsk/d10 Full Device Path: node1:/dev/rdsk/c3t6d0 Full Device Path: node2:/dev/rdsk/c2t6d0 Replication: none default_fencing: nofencing Now that we have changed the fencing protocol, add a quorum disk:clq add d5\\ Check the disk and make sure the //Access Mode// is set to //sq_disk// (software quorum). # clq show d5 === Quorum Devices === Quorum Device Name: d5 Enabled: yes Votes: 1 Global Name: /dev/did/rdsk/d5s2 Type: shared_disk Access Mode: sq_disk Hosts (enabled): node1, node2 We can now turn off installmode. # cluster set -p installmode=disabled # clq reset # clq show === Cluster Nodes === Node Name: node1 Node ID: 1 Quorum Vote Count: 1 Reservation Key: 0x4C927F9400000001 Node Name: node2 Node ID: 2 Quorum Vote Count: 1 Reservation Key: 0x4C927F9400000002 === Quorum Devices === Quorum Device Name: d5 Enabled: yes Votes: 1 Global Name: /dev/did/rdsk/d5s2 Type: shared_disk Access Mode: sq_disk Hosts (enabled): node1, node2