Wednesday, November 14, 2012

Finding space for Solaris Live Upgrade

Something that is often perceived to be an obstacle to using Solaris Live Upgrade is finding space to give to Live upgrade. There are fortunately quite a few options to help out.

Oracle of course recommends that you use spare disks or buy more disks. That is all well and fine for big corporates with deep pockets... assuming that you have slots available to plug in more disks.

So … on to the more attainable options.

To start off it helps to know how much space you will need to perform the actual upgrade. Solaris itself needs about 7GB for a Full plus OEM installation, excluding logs, crash dumps, spool files, and so on. Use the df command to check how much space is used by the root and other critical file systems (/var, /opt, /usr). While this is a good starting point, you may not need to replicate all of that if it includes 3rd-party software that stays the same.

Since you are also able to combine and split file systems, as well as explicitly “exclude” portions of the boot environment through the -x or -f options, it is possible to get an estimate of the amount of space needed from Live Upgrade. To find out how much space Live Upgrade will need, run the lucreate command that you plan on using up to the point where it displays the estimated space requirement, and then press Ctrl-C to abort it.

Option 1. By far the simplest scenario is if you are running on a ZFS root system already, then you are in luck: Live Upgrade has got good support for ZFS root, at least in recent versions of the tool, eg since Solaris 10 Update 6. It can take a snapshot of the root file system(s) and create a clone automatically, and then simply apply changes, like patches or an upgrade, to the clone.

ZFS makes it almost too easy, the command is simply:

# lucreate -c Sol10u5 -n Sol10u8

The command will automatically locate the Active Boot Environment (PBE) and utilise free space from the pool (It checks the active BE to determine what ZFS pool is to be used). The -c option above causes the Active BE to be named explicitly (Sol10u5 in this example), and the -n assigns a name for the new Alternate BE, eg Sol10u8. (Let's just assume I'm going from update 5 to update 8)

It is a good idea to name your BEs based on the version of the operating system that they have, especially on ZFS; With ZFS it is (too) easy to have many BEs, eg for testing and backup purposes.

When using ZFS clones, the space required to create a new BE is less than with other file systems. This is because the contents of the “clone” points at the same blocks on disk as the original source data. Once a block is written to (from either the clone or the origin), the Copy-On-Write part of ZFS takes care of the space allocation. Data that doesn't change will not be duplicated!

You can therefore safely use the traditional methods for estimating your disk space requirements and rest assured that you will in practice need less than that.

Option 2: Another ZFS pool, other than the one that you boot from, may have free space, or you may want to move to another pool on separate disks for any other reason. When you explicitly specify a ZFS pool different from the source pool, Live Upgrade will copy the contents in stead of cloning. Assuming a target ZFS pool name of “NEWPOOL, the command would be:

# lucreate -c Sol10u5 -n Sol10u8 -p NEWPOOL

As before the active BE is probed to determine the source for live upgrade.

Note that I as a habit use upper-case names for my ZFS pool names. That is because I like them so much. It is also because it makes them stand out as pool names in command outputs, particularly df and mount!

Not really a separate option as such, but worth mentioning here: With ZFS boot being new, people often want to migrate from UFS to a ZFS root – The commands are the same as when migrating from one ZFS pool to another – once again the source is automatically based on the active BE and only the destination is specified.

You must be running (or going to upgrade to) at least Solaris 10 release 10/08 (S10 update 6) in order to utilize ZFS root. If running Solaris earlier than Update 2 then it will not be possible to use ZFS since the kernel must also support ZFS, not only the Live upgrade tools.

In the below example I create the new ZFS root pool using the drive c0t0d1:

# zpool create RPOOL c0t0d1s1
# lucreate -c Sol10u5 -n Sol10u8 -p RPOOL

The lucreate command will copy the active BE into the new ZFS pool. Note: You don't have to actually upgrade. Once the copy (create process) completes, run luactivate and reboot to switch over to ZFS.

# luactivate Sol10u8
# init 6

After checking the system, clean up …

# ludelete Sol10u5
# prtvtoc /dev/rdsk/c0t0d1s2 | fmthard -s - /dev/rdsk/c0t0d0s2
# zpool attach RPOOL c0t0d1s1 c0t0d0s1

I want to highlight that I specified partition (slice) numbers above. Generally the recommendation is to gives ZFS access to the “whole disk”, but for booting it is a requirement to specify a slice.

A few extra considerations: The second disk is not automatically bootable, but rather than being redundant I will just link this excellent information

Now that you are on ZFS root you should also configure swap and dump “the ZFS way” - see here

If you choose for whatever reason not to move to ZFS root yet, maybe you are still not running Solaris 10 update 6 or later that supports ZFS booting, then you still have some options.

Option 3: Check whether you have free, unallocated space on any disks. The “prtvtoc” command will show areas on disk that are not part of any partition, as in the below example:

# prtvtoc /dev/dsk/c1t1d0s2

If any space is not allocated to a partition, there will be a section in the output before the partition table like this

* Unallocated space: 
*     First  Sector    Last 
*     Sector Count    Sector 
*     2097600 4293395776  526079 
*     8835840 4288229056  2097599 

If so, create a slice to overlay those specific cylinders (I do this carefully by hand on a case-by-case basis), and then use the newly created slice.

Note: A Solaris partition is called a disk-slice by the disk management tools. On X86, there is a separate concept called a partition, which is a BIOS partitioning. In this situation, all Solaris disk slices exist inside the Solaris tagged partition.

Option 4: If you do not have unallocated space on any disks, you might still have unused slices... Be careful though – unmounted is not the same as unused! Check with your DBAs whether they are using any raw partitions, ie partitions without a mounted file system. I've also seen cases where people unmount their backup file systems as a “security” measure, though the value in that is debatable.

It may be worth mentioning that when looking for space, you can use any disks in the system, it does not have to just be one of the first two disks, or even an internal disk.

To specify a slice to use as target you use the -m option of lucreate, eg

# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/dsk/c0t0d1s6:ufs

The above command will use /dev/dsk/c0t0d1s6 as the target for the root file system on the new BE.

You can also use SVM meta-devices. For example

# metainit d106 c0t0d1s6
# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/md/dsk/d106:ufs

Or on a mirror (assuming two free slices)

# metainit -f d11 1 1 c0t0d0s0
# metainit -f d12 1 1 c0t1d0s0
# metainit d10 -m d11 d12
# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/md/dsk/d10:ufs

Note that the traditional “metaroot” step is left to Live-upgrade to handle, and that the mirror volume in the example is created without syncing because the slices are both blank! You could always rather attach the second sub-mirror in the traditional way just to be safe.

Option 5: Split /var and the root file systems. If you have two slices somewhere but neither is large enough to hold the entire system, this could work. Then after completing the upgrade, you can delete the old BE to free up the original boot disk, and “migrate” back to that. It involves a bit of work, but you would use Live upgrade for this migration, which is exactly the kind of thing that makes Solaris so beautiful.

The commands to split out /var from root would look like this.

# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/md/dsk/d10:ufs -m /var:/dev/md/dsk/d11:ufs

When you compare this with the previous example you will notice there is an extra -m option for /var. Each mount point specified with -m will become a separate file system in the target BE. Adding an extra entry for /usr or any other file system works in same way. To better understand the -m options, think of them as instructions to Live upgrade about how to build the vfstab file for the new BE.

Note that non-critical file systems, eg anything other than root, /var, /usr and /opt are automatically kept separate and considered as shared between BEs.

Option 6: Temporarily deploy a swap partition or slice to use as a root file system. This would work if you have “spare” swap space. Don't scoff - I've many a times seen systems that have swap space configured purely for purposes of saving core dumps. The commands would be

# swap -x /dev/dsk/c0t0d0s0
# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/dsk/c0t0d0s0:ufs

There would be some clean-up work left once everything is done, for example deleting the old BE and creating a new swap partition from that space.

Option 7: A final option is to break an existing SVM mirror. In this case it will not be necessary to copy the file system over to the target, because due to the mirror, it is already there. The meta-device for the sub-mirror is also already there. We will however create a new single-sided SVM Mirrored volume from this sub-mirror for this process.

To do this you specify two things: A new “mirror” volume, as well as the sub-mirror to be detached and then attached to the new mirror volume.

Assuming we have d10 with sub-mirrors d11 and d12, we will create a new mirror volume called d100. We will remove d12 from d10, and attach it to d100. A single lucreate command takes care of all of that:

# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/md/dsk/d100:mirror,ufs -m /:/dev/md/dsk/d12:detach,attach,preserve

To examine the above command: You can see that -m is specified twice, both times for root. The first have the tag or “mirror,ufs” and it creates the new mirror volume. The second have tags “detach,attach,preserve”. Detach: Live upgrade needs to detach it first. Attach: Do not use it directly, in stead attach it to the volume. Preserve: No need to reformat and copy the file system contents.

In stead of breaking a mirror and re-using the sub-mirrors, lucreate can set up the SVM meta-devices, for example:

# lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/md/dsk/d100:mirror,ufs -m /:/dev/dsk/c1d0t0s3,d12:attach

Comparing to the previous example you will notice that the device specifier in the second field of the second -m option lists a physical disk slice as well as a name for a new meta-device. You will also notice that the only tag is “attach” because the new device doesn't need to be detached, and can't be “preserved” since it doesn't have any data.

Option 8: If you have your root file systems mirrored with Veritas Volume manager, and there is no other free space large enough to hold a root file system, then I suggest that you manually break the mirror to free up a disk, rather than try to use the vxlu* scripts.

I have not personally had access to a VXVM based system in years but from the rough time many people apparently have, based on the questions I see in forums, I would recommend that you un-encapsulate, perform the upgrade, and then finally then re-encapsulate.

Option 9: If you have some rarely accessed data on disk you may have the option of temporarily moving that onto another system or even a tape in order to free up space. After completing the upgrade you can restore this data.

Option 10: Move 3rd party applications and remove crash dumps, stale log files, old copies of Recommended patch clusters, and the likes, to other disks or file systems. This actually isn't a separate recommendation – it is something you should be doing in any case, in addition to any other options you use. This should be "Option # 0"

These files, if residing in the critical file systems, will be copied unless you expressly exclude them.
With ZFS root and snapshots it is less of an issue – the snapshot doesn't duplicate data until a change is written to a file. This however could create the reverse of the problem: An update to “shared” files that lives in a critical file system, will not be replicated back to the original BE because data in a cloned file system is treated as not shared!

You probably can not exclude your applications or software directories, so instead do this: First move the application directory to any shared file system. Then create a soft-link from the location where the directory used to be, to where you moved it to. I have yet to encounter an application that will not allow itself to be relocated in this way, and can confirm that it works fine for SAP, Oracle applications, Siebel and SAS, as well as many other “infrastructure” software components, like Connect Direct, Control-M, TNG, Netbackup, etc.

A few more notes:

  1. Swap devices are shared between boot environments if you do not specifically handle them. In most cases this default behaviour should be sufficient.
  2. If you have /var as a separate file system, it will be “merged” it into the root file system, unless expressly specified with a separate -m option. This is true for all the critical file systems: root, /var, /usr and /opt
  3. On the other hand, all shareable, non-critical file systems are handled as shared by default. This means they will not be copied, merged, or have any changes done to them, and will be used as is and in place.
  4. To merge a non-critical file system into its parent, use the special “merged” device as as the target in the -m option. For example will merge /home into the root
    # lucreate -c Sol10u5 -n Sol10u8 -m /:/dev/md/dsk/d100:ufs -m /home:merged:ufs

In this article I have not really spoken about the actual upgrade that happens after the new BE is created. I've posted about it in the past and in most cases it is already well documented on the web.

Another very interesting subject is what happens during luactivate! I'll leave that for another day! It is a real pity that oracle is depreciating Live upgrade, but it will still be around for a while.