Thursday, February 17, 2011

Live Upgrade to install the recommended patch cluster on a ZFS snapshot

Live Upgrade used to require that you find some free slices (partitions) and then fidget with the -R "alternate Root" options to install the patch cluster to an ABE. With ZFS all of those pains have just ... gone away ...

Nowadays Live Upgrade on ZFS don't even copy the installation, instead it automatically clones a snapshot of the boot environment, saving much time and disk space! Even the patch install script is geared towards patching an Alternate Boot Environment!

The patching process involves six steps:

  1. Apply Pre-requisite patches
  2. Create an Alternate Boot Environment
  3. Apply the patch cluster to this ABE
  4. Activate the ABE
  5. Reboot
  6. Cleanup

Note: The system remains online throughout all except the reboot step.

In preparation you uncompress the downloaded patch cluster file. I create a zfs file system and mounted it on /patches, and extracted the cluster in there. Furthermore, you have to read the cluster README file - it contains a "password" needed to install, and information about pre-requisites and gotches. Read the file. This is your job!

The pre-requisites are essentially just patches to the patch-add tools, conveniently included in the Patch Cluster!

Step 1 - Install the pre-requisites for applying the cluster to the ABE

# cd /patches/10_x86_Recommended
# ./installcluster --apply-prereq

Note - If you get an Error due to insufficient space in /var/run, see my previous blog post here!

Step 2 - Create an Alternate boot environment (ABE)

# lucreate -c s10u9 -n s10u9patched -p rpool

Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
Current boot environment is named <s10u9>.
Creating initial configuration for primary boot environment <s10u9>.
The device </dev/dsk/c1t0d0s0> is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name <s10u9> PBE Boot Device </dev/dsk/c1t0d0s0>.
Comparing source boot environment <s10u9> file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <s10u9patched>.
Source boot environment is <s10u9>.
Creating boot environment <s10u9patched>.
Cloning file systems from boot environment <s10u9> to create boot environment <s10u9patched>.
<B>Creating snapshot</B> for <rpool/ROOT/s10_0910> on <rpool/ROOT/s10_0910@s10u9patched>.
<B>Creating clone</B> for <rpool/ROOT/s10_0910@s10u9patched> on <rpool/ROOT/s10u9patched>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/s10u9patched>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <s10u9patched> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <s10u9patched> in GRUB menu
Population of boot environment <s10u9patched> successful.
Creation of boot environment <s10u9patched> successful.

There is now an extra boot environment to which we can apply the Patch Cluster. Note - for what it is worth, if you just needed a test environment to play in, you can now luactivate the alternate boot environment and then make any changes to the active system. If the system breaks, all it takes to undo any and all changes is a reboot.

Step 3 - Apply the patch cluster to the BE named s10u9patched.

# cd /patches/10_x86_Recommended
# ./installcluster -B s10u9patched

I am not showing the long and boring output from the installcluster script as this blog post is already far too long. The patching runs for quite a while, plan for at least two hours. Monitor the process and check the log for warnings. Depending on how long it has been since the last patches were applied, some severe patches may be applied which can affect your ability to login after rebooting. Again: READ the README!

Step 4 - Activate the ABE.

# luactivate s10u9patched
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE <s10u9>
A Live Upgrade Sync operation will be performed on startup of boot environment <s10u9patched>.

Generating boot-sign for ABE <s10u9patched>
Generating partition and slice information for ABE <s10u9patched>
Copied boot menu from top level dataset.
Generating multiboot menu entries for PBE.
Generating multiboot menu entries for ABE.
Disabling splashimage
Re-enabling splashimage
No more bootadm entries. Deletion of bootadm entries is complete.
GRUB menu default setting is unaffected
Done eliding bootadm entries.

**********************************************************************

The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Boot from the Solaris failsafe or boot in Single User mode from Solaris
Install CD or Network.

2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following commands in sequence to mount the BE:

     zpool import rpool
     zfs inherit -r mountpoint rpool/ROOT/s10_0910
     zfs set mountpoint=<mountpointName> rpool/ROOT/s10_0910
     zfs mount rpool/ROOT/s10_0910

3. Run <luactivate> utility with out any arguments from the Parent boot
environment root slice, as shown below:

     <mountpointName>/sbin/luactivate

4. luactivate, activates the previous working boot environment and
indicates the result.

5. Exit Single User mode and reboot the machine.

**********************************************************************

Modifying boot archive service
Propagating findroot GRUB for menu conversion.
File </etc/lu/installgrub.findroot> propagation successful
File </etc/lu/stage1.findroot> propagation successful
File </etc/lu/stage2.findroot> propagation successful
File </etc/lu/GRUB_capability> propagation successful
Deleting stale GRUB loader from all BEs.
File </etc/lu/installgrub.latest> deletion successful
File </etc/lu/stage1.latest> deletion successful
File </etc/lu/stage2.latest> deletion successful
Activation of boot environment <s10u9patched> successful.

# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
s10u9                      yes      no     no        yes    -
s10u9patched               yes      yes    yes       no     -

Carefully take note of the details on how to recover from a failure. Making a hard-copy of this is not a bad idea! Take note that you have to use either init or shutdown to effect the reboot, as the other commands will circumvent some of the delayed action scripts! Hence ...

Step 5 - Reboot using shutdown or init ...

# init 6

Monitor the boot-up sequence. A few handy commands while you are performing the upgrade, includes:

# lustatus
# bootadm list-menu
# zfs list -t all

You will eventually (after confirming that everything works as expected) want to free up the disk space held by the snapshots. The first command cleans up the redundant Live Upgrade entries as well as the relevant ZFS snapshot storage! The second is to remove the temporary ZFS file system used for the patching.

Step 6 - Cleanup

# ludelete s10u9
# zfs destroy rpool/patches

Again no worries about where the space comes from. ZFS simply manages it! Live Upgrade takes care of your grub boot menu and gives you clear instructions on how to recover it anything goes wrong.

Adding a ZFS zvol for extra swap space

ZFS sometimes truly takes the think work out of allocating and managing space on your file systems. But only sometimes.

Many operations on Solaris, OpenSolaris and Indiana will cause you to run into swap space issues. For example using the new Solaris 10 VirtualBox appliance, you will get the following message when you try to install the Recommended Patch Cluster:

Insufficient space available in /var/run to complete installation of this patch
set. On supported configurations, /var/run is a tmpfs filesystem resident in
swap. Additional free swap is required to proceed applying further patches. To
increase the available free swap, either add new storage resources to swap
pool, or reboot the system. This script may then be rerun to continue
installation of the patch set.

This is fixed easily enough by adding more swap space, like this:

# zfs create -V 1GB -b $(pagesize) rpool/swap2
# zfs set refreservation=1GB rpool/swap2
# swap -a /dev/zvol/dsk/rpool/swap2
# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 181,2       8 1048568 1048568
/dev/zvol/dsk/rpool/swap2 181,1       8 2097144 2097144

Setting the reservation is important, particularly if you plan on making the change permanent, eg by adding the new zvol as a swap entry in /etc/vfstab. ZFS does not reserve the space for swapping otherwise, so the swap system may think there is space which isn't actually there if you don't do this.

The -b option sets the volblocksize to improve swap performance by aligning the volume I/O units on disk to the size of the host architecture memory page size (4 KB on x86 systems and 8KB on SPARC, as reported by the pagesize command.)

If this is just temporary, then cleaning up afterwards is just as easy:

# swap -d /dev/zvol/dsk/rpool/swap2
# zfs destroy rpool/swap2

It is also possible to grow the existing swap volume. To do so, set a new size and refreservation for the existing volume like this:

# swap -d /dev/zvol/dsk/rpool/swap
# zfs set volsize=2g rpool/swap
# zfs set refreservation=2g rpool/swap
# swap -a /dev/zvol/dsk/rpool/swap

And finally, it is possible to do the above without unmounting/remounting the swap device, by using the following "trick":

# zfs set volsize=2g rpool/swap
# zfs set refreservation=2g rpool/swap
# swap -l | awk '/rpool.swap/ {print $3+$4}'|read OFFSET
# env NOINUSE_CHECK=1 swap -a /dev/zvol/dsk/rpool/swap $OFFSET

The above will calculate the offset in the swap device and add a new "device" to the list of swap devices. This will automatically use the added space in the zvol. The Offset will be shown as the "swaplo" value in swap -l output. Multiple swap devices on the same physical media is not ideal, but on the next reboot (or by deleting and re-adding the swap device) the system will recognise the full size of the volume.

No worries about where the space comes from. ZFS just allocates it! The flip side of the coin is that once you have quotas, reservations, allocations, indirect allocations such as from snapshots, figuring out where your space has gone can become quite tricky! I'll blog about this some time!