Showing posts with label zfs. Show all posts
Showing posts with label zfs. Show all posts

Thursday, February 17, 2011

Live Upgrade to install the recommended patch cluster on a ZFS snapshot

Live Upgrade used to require that you find some free slices (partitions) and then fidget with the -R "alternate Root" options to install the patch cluster to an ABE. With ZFS all of those pains have just ... gone away ...

Nowadays Live Upgrade on ZFS don't even copy the installation, instead it automatically clones a snapshot of the boot environment, saving much time and disk space! Even the patch install script is geared towards patching an Alternate Boot Environment!

The patching process involves six steps:

  1. Apply Pre-requisite patches
  2. Create an Alternate Boot Environment
  3. Apply the patch cluster to this ABE
  4. Activate the ABE
  5. Reboot
  6. Cleanup

Note: The system remains online throughout all except the reboot step.

In preparation you uncompress the downloaded patch cluster file. I create a zfs file system and mounted it on /patches, and extracted the cluster in there. Furthermore, you have to read the cluster README file - it contains a "password" needed to install, and information about pre-requisites and gotches. Read the file. This is your job!

The pre-requisites are essentially just patches to the patch-add tools, conveniently included in the Patch Cluster!

Step 1 - Install the pre-requisites for applying the cluster to the ABE

# cd /patches/10_x86_Recommended
# ./installcluster --apply-prereq

Note - If you get an Error due to insufficient space in /var/run, see my previous blog post here!

Step 2 - Create an Alternate boot environment (ABE)

# lucreate -c s10u9 -n s10u9patched -p rpool

Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
Current boot environment is named <s10u9>.
Creating initial configuration for primary boot environment <s10u9>.
The device </dev/dsk/c1t0d0s0> is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name <s10u9> PBE Boot Device </dev/dsk/c1t0d0s0>.
Comparing source boot environment <s10u9> file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <s10u9patched>.
Source boot environment is <s10u9>.
Creating boot environment <s10u9patched>.
Cloning file systems from boot environment <s10u9> to create boot environment <s10u9patched>.
<B>Creating snapshot</B> for <rpool/ROOT/s10_0910> on <rpool/ROOT/s10_0910@s10u9patched>.
<B>Creating clone</B> for <rpool/ROOT/s10_0910@s10u9patched> on <rpool/ROOT/s10u9patched>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/s10u9patched>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <s10u9patched> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <s10u9patched> in GRUB menu
Population of boot environment <s10u9patched> successful.
Creation of boot environment <s10u9patched> successful.

There is now an extra boot environment to which we can apply the Patch Cluster. Note - for what it is worth, if you just needed a test environment to play in, you can now luactivate the alternate boot environment and then make any changes to the active system. If the system breaks, all it takes to undo any and all changes is a reboot.

Step 3 - Apply the patch cluster to the BE named s10u9patched.

# cd /patches/10_x86_Recommended
# ./installcluster -B s10u9patched

I am not showing the long and boring output from the installcluster script as this blog post is already far too long. The patching runs for quite a while, plan for at least two hours. Monitor the process and check the log for warnings. Depending on how long it has been since the last patches were applied, some severe patches may be applied which can affect your ability to login after rebooting. Again: READ the README!

Step 4 - Activate the ABE.

# luactivate s10u9patched
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE <s10u9>
A Live Upgrade Sync operation will be performed on startup of boot environment <s10u9patched>.

Generating boot-sign for ABE <s10u9patched>
Generating partition and slice information for ABE <s10u9patched>
Copied boot menu from top level dataset.
Generating multiboot menu entries for PBE.
Generating multiboot menu entries for ABE.
Disabling splashimage
Re-enabling splashimage
No more bootadm entries. Deletion of bootadm entries is complete.
GRUB menu default setting is unaffected
Done eliding bootadm entries.

**********************************************************************

The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Boot from the Solaris failsafe or boot in Single User mode from Solaris
Install CD or Network.

2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following commands in sequence to mount the BE:

     zpool import rpool
     zfs inherit -r mountpoint rpool/ROOT/s10_0910
     zfs set mountpoint=<mountpointName> rpool/ROOT/s10_0910
     zfs mount rpool/ROOT/s10_0910

3. Run <luactivate> utility with out any arguments from the Parent boot
environment root slice, as shown below:

     <mountpointName>/sbin/luactivate

4. luactivate, activates the previous working boot environment and
indicates the result.

5. Exit Single User mode and reboot the machine.

**********************************************************************

Modifying boot archive service
Propagating findroot GRUB for menu conversion.
File </etc/lu/installgrub.findroot> propagation successful
File </etc/lu/stage1.findroot> propagation successful
File </etc/lu/stage2.findroot> propagation successful
File </etc/lu/GRUB_capability> propagation successful
Deleting stale GRUB loader from all BEs.
File </etc/lu/installgrub.latest> deletion successful
File </etc/lu/stage1.latest> deletion successful
File </etc/lu/stage2.latest> deletion successful
Activation of boot environment <s10u9patched> successful.

# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
s10u9                      yes      no     no        yes    -
s10u9patched               yes      yes    yes       no     -

Carefully take note of the details on how to recover from a failure. Making a hard-copy of this is not a bad idea! Take note that you have to use either init or shutdown to effect the reboot, as the other commands will circumvent some of the delayed action scripts! Hence ...

Step 5 - Reboot using shutdown or init ...

# init 6

Monitor the boot-up sequence. A few handy commands while you are performing the upgrade, includes:

# lustatus
# bootadm list-menu
# zfs list -t all

You will eventually (after confirming that everything works as expected) want to free up the disk space held by the snapshots. The first command cleans up the redundant Live Upgrade entries as well as the relevant ZFS snapshot storage! The second is to remove the temporary ZFS file system used for the patching.

Step 6 - Cleanup

# ludelete s10u9
# zfs destroy rpool/patches

Again no worries about where the space comes from. ZFS simply manages it! Live Upgrade takes care of your grub boot menu and gives you clear instructions on how to recover it anything goes wrong.

Adding a ZFS zvol for extra swap space

ZFS sometimes truly takes the think work out of allocating and managing space on your file systems. But only sometimes.

Many operations on Solaris, OpenSolaris and Indiana will cause you to run into swap space issues. For example using the new Solaris 10 VirtualBox appliance, you will get the following message when you try to install the Recommended Patch Cluster:

Insufficient space available in /var/run to complete installation of this patch
set. On supported configurations, /var/run is a tmpfs filesystem resident in
swap. Additional free swap is required to proceed applying further patches. To
increase the available free swap, either add new storage resources to swap
pool, or reboot the system. This script may then be rerun to continue
installation of the patch set.

This is fixed easily enough by adding more swap space, like this:

# zfs create -V 1GB -b $(pagesize) rpool/swap2
# zfs set refreservation=1GB rpool/swap2
# swap -a /dev/zvol/dsk/rpool/swap2
# swap -l
swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool/swap 181,2       8 1048568 1048568
/dev/zvol/dsk/rpool/swap2 181,1       8 2097144 2097144

Setting the reservation is important, particularly if you plan on making the change permanent, eg by adding the new zvol as a swap entry in /etc/vfstab. ZFS does not reserve the space for swapping otherwise, so the swap system may think there is space which isn't actually there if you don't do this.

The -b option sets the volblocksize to improve swap performance by aligning the volume I/O units on disk to the size of the host architecture memory page size (4 KB on x86 systems and 8KB on SPARC, as reported by the pagesize command.)

If this is just temporary, then cleaning up afterwards is just as easy:

# swap -d /dev/zvol/dsk/rpool/swap2
# zfs destroy rpool/swap2

It is also possible to grow the existing swap volume. To do so, set a new size and refreservation for the existing volume like this:

# swap -d /dev/zvol/dsk/rpool/swap
# zfs set volsize=2g rpool/swap
# zfs set refreservation=2g rpool/swap
# swap -a /dev/zvol/dsk/rpool/swap

And finally, it is possible to do the above without unmounting/remounting the swap device, by using the following "trick":

# zfs set volsize=2g rpool/swap
# zfs set refreservation=2g rpool/swap
# swap -l | awk '/rpool.swap/ {print $3+$4}'|read OFFSET
# env NOINUSE_CHECK=1 swap -a /dev/zvol/dsk/rpool/swap $OFFSET

The above will calculate the offset in the swap device and add a new "device" to the list of swap devices. This will automatically use the added space in the zvol. The Offset will be shown as the "swaplo" value in swap -l output. Multiple swap devices on the same physical media is not ideal, but on the next reboot (or by deleting and re-adding the swap device) the system will recognise the full size of the volume.

No worries about where the space comes from. ZFS just allocates it! The flip side of the coin is that once you have quotas, reservations, allocations, indirect allocations such as from snapshots, figuring out where your space has gone can become quite tricky! I'll blog about this some time!

Monday, December 6, 2010

Useless Performance Comparisons

The point of performance comparisons or benchmark articles has to be purely sensational. By far the most of these appear to have little value other than attracting less educated readers to the relevant websites.

In a recent article Michael Larabel of Phoronix reports on the relative performance of various file systems under Linux, specifically comparing the traditional Linux file systems to the new (not yet quite available) native ZFS module. According to the article ZFS performs slower than the other Linux file systems in most of the tests, but I have a number of issues with both how the testing was done and with how the article was written.

Solaris 11 Express should have been included in the test, and the results for OpenIndiana should be shown for all tests. It is crucial that report include other system metrics such as CPU utilization during the test runs.

I also have some even more serious gripes. In particular the blanket statement that some unspecified “subset” of the tests were performed on both a standard SATA hard drive and the SSD drive, but that the results were “proportionally” the same – does not make sense as some tests are more seek latency sensitive than others, and some file systems hide these latencies better than others.

Another serious gripe is that there is no feature comparison. More complex software has more work to do, and one would expect some trade-offs.

Even worse: two of ZFS’s strengths were eliminated by the way the testing was done. Firstly when ZFS is given a “whole disk” as is recommended in the ZFS best practices (as opposed to being given just a partition) it will safely enable the disk’s write cache. It only does this if it knows that there are no other file systems on the disk, i.e when ZFS is in control of the whole disk. Secondly ZFS manages many disks very efficiently, particularly as far as is concerned allocating space: ZFS performance doesn't come into its own right on a single disk system!

Importantly, and especially so since this is very much a beta version of a port of a mature and stable product, we need to understand which of ZFS's features are present, different and/or missing compared to the mature product. For example some of ZFS’s biggest performance inhibitors under FUSE is that it is limited to a single-threaded ioctl (Ed: Apparently this is fixed in ZFS for Linux 0.6.9, but I am unable to tell whether this is the version Phoronix tested) - and not having access to the disk devices at a low level. The KQ Infotech website lists some missing features, particularly interesting is the missing Linux async I/O support. Furthermore the KQ Infotech FAQ states that Direct IO falls back to buffered read/write functions and that missing kernel APIs are being emmulated though the "Solaris Porting Layer".

A quick search highlights some serious known issues, such as the Linux VFS Cache and ZFS ARC cache copy duplication bug, a bug which heavily impacts on performance.

More information about missing features can be found on the LLNL issue tracker page.

If nothing else, the article should mention the fact that there are known severe performance issues and feature incompleteness with the Linux native ZFS module! The way in which Linux allocates and manages virtual address space is inefficient (don't take my word for it, see this and this), requiring expensive workarounds.

Besides all of this my real, main gripe is about this kind of article in general. The common practice of testing everything with “default installation settings” implies that nothing else needs to be done - however when you want the absolute best possible performance out of something, you need to tune it for the specific workload and conditions. In the case of the article in question, the statement reads “All file-systems were tested with their default mount options”, and no other information is given, such as whether the disk was partitioned, whether the different subject file systems where mounted at the same time, what disk the system was booted from and whether the operating system was running with part of the disk hosting the tested file system mounted as its root. We don’t even know whether the author read the ZFS Best Practices Guide.

It can be argued that the average person will not tune the system, or in this case the file system, for one specific workload because their workstation should be an all-round performer, but you should still comply with the best practices recommendations from the vendors, especially if performance is one of your main criteria.

I don’t know whether using defaults is ever acceptable in this kind of article. My issue stems from how these articles are written in a way that suggests that performance is the only or at least the most important factor in choosing an operating system, or file system, of graphics card, or CPU or whatever the subject is. If that were true then at least the system should be tuned to make the most of each of the subject candidates, whether these are hardware or software parts being tested and compared to one another. This tuning is often done by disabling features, configuring the relevant options, and usually to get it right you would need to have someone who is a performance expert on that piece of software or hardware to optimize it for each test.

Specific hardware (or software) often favor one or the other of the entrants. An optimized, feature poor system will outperform a complex, feature rich system on limited hardware. Making the best use of the available hardware might mean different implementation choices when optimizing for performance rather than for functionality or reliability. ZFS in particular comes into its own right, both in terms of features and performance, when it has underneath it a lot of hardware – RAM, disk controllers, and as many dedicated disk drives as possible. The other file systems have likely reached their performance limit on the limited hardware on which the testing was done. Linux is particularly aimed at the non-NUMA, single-core, Single hard drive, single user environment. Solaris, and ZFS, was developed in a company where single user workstations were an almost non-existing target, the real target of course being the large servers of the big corporates.

As documented in the ZFS Evil Tuning Guide, many tuning options exist. One could turn off ZFS check-sum calculations, limit the ARC cache sizes, set the SSD disk as a cache or log device for the SATA disk, and set the pool to cache only meta data, to mention a few. Looking at the hardware available in the Phoronix article, the choices would depend on the specific test – in another test one might stripe between the SATA disk and the SSD disk, in another you might choose to mirror across the two.

The other file system candidates might have different recommendations in terms of how to optimize for performance.

I realize that the functionality would be affected by such tuning, but the article doesn’t look at functionality, or even usability for that matter. ZFS provides good reliability and data integrity, but only in its default configuration, with data check-summing turned on. The data protection levels and usable space in each test might be different, but that again is a function of which features are used and not the subject of the article, not even mentioned anywhere in the article.

As a point in case for the argument about functionality, once needs to consider all that ZFS is doing in addition to being a POSIX compliant file system. It replaces the volume manager. It adds data integrity checking through check-summing. It manages space allocation, including space for file systems, meta-data, snapshots, and ZVOLs (virtual devices created out of a ZFS pool) automatically. Usage can be controlled by means of a complete set of reservation and quota options. Changing settings, such as turning on encryption, the number of copies of data to be kept, whether to do check-summing, etc is dynamic. There is much more as Google will tell.

And just to add insult to injury, the article goes and pits XFS against ZFS, ignoring the many severe reliability issues present with XFS, such as the often reported data corruption under heavy load and severe file system corruption when losing power.

I would really like to see a performance competition one day. The details of how the testing will be done will be given out in advance to allow the teams to research it. Each team is then given access to the same budget from which to buy and build their own system to enter into the competition. Their performance experts then set up and build the systems, and install the software and tune it for the tests on the specific hardware they have available. One team might buy a system with more CPUs while another might buy a system with more disks and SCSI controllers, but the test is fair (barring my observation about how feature poor systems will always perform better on a low-budget system) because the teams each solves the same problem with the same budget. The teams submit to the judges their ready systems to run the performance test scripts and publish their configuration details in a how-to guide. To eliminate cheating, an independent group will try to duplicate the team’s test results using the guide.

I think this would make a fun event for a LAN party – any sponsors interested?

Sunday, April 26, 2009

ZFS user quotas available in SNV build 114

I noted, as per Chris Gerhard's Weblog that user and group Quotas on ZFS will be available soon - the fix to bug ID 6501037 is currently slated for inclusion in ON build 114.

Once this becomes available I will have one fewer item on my list of features missing from ZFS.

Currently to limit users' consumption the workaround documented here is to provide each user with a dedicated directory on which another dataset is mounted and a quota is set. This implies that the user can only create or write to files in that specific directory. To track and limit a user's total usage across an entire ZFS pool requires User quotas - ditto for consumption by group.

According to this post by Matty the feature is implemented in a way which enforces the rule "tardily", that is it is a little "late", and also mentions that translated SIDs (eg when the directory is shared via SMB) are supported.

The PSARC/2009/204/ document here provides details of how the quotas is implemented. Two new zfs subcommands, namely zfs userspace and zfs groupspace reports the consumption, and control is by means of a set of new properties on ZFS file system datasets.

This amounts to good news all around. Maybe I should start tracking bug IDs for all of the items on my feature wish-list!

Sunday, July 6, 2008

ZFS missing features

What would truly make ZFS be The Last Word in File Systems (PDF)?

Why every feature of course! Here is my wishlist!

  1. Nested vdevs (eg Raid 1+Z)
  2. Hirarchical Storage management (migrate rarely used files to cheaper/slower vdevs)
  3. Traditional Unix Quotas (i.e for when you have multiple users owning files in a the same directories spread out across a file system)
  4. A way to convert a directory on a ZFS file system into a new ZFS file system, and the corresponding reverse function to merge a directory back into its parent (because the admin made some wrong decision)
  5. Backup function supporting partial restores. In fact partial backups should be possible too, eg backing up any directory or file list, not necesarily only at the file system level. And restores which does not require the file system to be unmounted / re-mounted.
  6. Re-layout of pools (to accomodate adding disks to a raidz or converting a non-redundant pool to raidz or removing disks from a pool, etc) (Yes I'm aware of some work in this regard)
  7. Built-in Multi-pathing capabilities (with automatic/intelligent detection of active paths to devices), eg integrated MPxIO functionality. I'm guessing this is not there yet because people may want to use MPxIO for other devices not under ZFS control and this will create situations where there are redundant layers of multipathing logic.
  8. True Global File System functionality (multiple hosts accessing the same LUNs and mounting the same file systems with parallel write. Or even just a sharezfs (like sharenfs, but allowing the client to access ZFS features, eg to set ZFS properties, create datasets, snapshots, etc, similar in functionality to what is possible with granting a zone ownership of a zfs dataset.)
  9. While we're at it: In place conversion from, eg UFS to ZFS.
  10. The ability to snapshot a single file in a ZFS file system (So that you can affect per-file version tracking)
  11. An option on the zpool create command to take a list of disks and automatically set up a layout, intelligently taking into considderation the number of disks and the number of controllers, allowing the user to select from a set of profiles determining optimization for performance, space or redundancy.

So... what would it take to see ZFS as the new default file system on, for example USB thumb drives, memory cards for digital cameras and cell phones, etc? In fact, can't we use ZFS for RAM management too (snapshot system memory)?




Tuesday, July 1, 2008

Let ZFS manage even more space more eficiently

The idea of using ZFS to manage process core dumps begs to be expanded to at least crash dumps. This also enters into the realm of Live Upgrade as it eliminates the need to sync potentially a lot of data on activation of a new BE!

Previously I created a ZFS file system in the root pool, and mounted it on /var/cores.

The same purpose would be even better served with a generic ZFS file system which can be mounted on any currently active Live-Upgrade boot environment. The discussion here suggests the use of a ZFS file system rpool/var_shared, mounted under /var/shared. Directories such as /var/crash and /var/cores can then be moved into this shared file system.

So:

/ $ pfexec ksh -o vi
/ $ zfs create rpool/var_shared
/ $ zfs set mountpoint=/var/shared rpool/var_shared
/ $ mkdir -m 1777 /var/shared/cores
/ $ mkdir /var/shared/crash
/ $ mv /var/crash/`hostname` /var/shared/crash

View my handiwork:

/ $ ls -l /var/shared

total 6
drwxrwxrwt   2 root     root           2 Jun 27 17:11 cores
drwx------   3 root     root           3 Jun 27 17:11 crash
/ $ zfs list -r rpool
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     13.3G  6.89G    44K  /rpool
rpool/ROOT                10.3G  6.89G    18K  legacy
rpool/ROOT/snv_91         5.95G  6.89G  5.94G  /.alt.tmp.b-b0.mnt/
rpool/ROOT/snv_91@snv_92  5.36M      -  5.94G  -
rpool/ROOT/snv_92         4.33G  6.89G  5.95G  /
rpool/dump                1.50G  6.89G  1.50G  -
rpool/export              6.83M  6.89G    19K  /export
rpool/export/home         6.81M  6.89G  6.81M  /export/home
rpool/swap                1.50G  8.38G  10.3M  -
rpool/export/cores          20K  2.00G    20K  /var/cores
rpool/var_shared            22K  3.00G    22K  /var/shared

Just to review the current settings for saving crash dumps:

/ $ dumpadm

      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/crash/solwarg
  Savecore enabled: yes

Set it to use the new path I made above:

/ $ dumpadm -s /var/shared/crash/`hostname`

      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
Savecore directory: /var/shared/crash/solwarg
  Savecore enabled: yes

Similarly I update the process core dump settings:

/ $ coreadm -g /var/shared/cores/core.%z.%f.%u.%t
/ $ coreadm

     global core file pattern: /var/shared/cores/core.%z.%f.%u.%t
     global core file content: default
       init core file pattern: core
       init core file content: default
            global core dumps: disabled
       per-process core dumps: enabled
      global setid core dumps: enabled
 per-process setid core dumps: disabled
     global core dump logging: enabled

And finally, some cleaning up:

/ $ zfs destroy rpool/export/cores
/ $ cd /var
/var $ rmdir crash
/var $ ln -s shared/crash
/var $ rmdir cores

As previously, the above soft link is just in case somewhere there is a naughty script or tool with a hard coded path to /var/crash/`hostname`. I don't expect to find something like that in oficially released Sun software, but I do some times use programs not officially released or supported by Sun.

This makes me wonder what else can I make it do! I'm looking forward to my next Live Upgrade to see how well it preserves my configuration before I attempt to move any of the spool directories from /var to /var/shared!



Thursday, June 19, 2008

Using a dedicated ZFS file system to manage process core dumps

ZFS just bristles with potential. Quotas, Reservations, turning compression or atime updates on or off without unmounting. The list goes on.

So now that we have ZFS root (Since Nevada build SNV_90, and even earlier when using OpenSolaris or other distributions) lets start to make use of these features.

First thing is, on my computer I don't care about access time updates on files or directories, so I disable it.

/ $ pfexec zfs set atime=off rpool

That is not particularly spectacular in itself, but since it is there I use it. The idea is of course to save a few disk updates and the corresponding IOs.

Next: core dumps. One of my pet hates. Many processes dumps core in your home dir, these get overwritten or forgotten, and then there are any number of core files lying around all over the file systems, all off these just wasting space since I don't really intent do try to analyze any of them.

Solaris has got a great feature by which core dumps can be all directed to go to a single directory and, on top of that, to have more meaningful file names.

So the idea is to create a directory, say /var/cores and then store the core files in there for later review. But knowing myself these files will just continue to waste space until I one day decide to actually try and troubleshoot a specific issue.

To me this sounds like a perfect job for ZFS.

First I check that there is not already something called /var/cores:

/ $ ls /var/cores
/var/cores: No such file or directory

Great. Now I create it.

/ $ pfexec zfs create rpool/export/cores
/ $ pfexec zfs set mountpoint=/var/cores rpool/export/cores

And set a limit on how much space it can ever consume:

/ $ pfexec zfs set quota=2g rpool/export/cores

Note: This can easily be changed at any time, simply by setting a new quota.

Which creates the below picture.

/ $ df -h
Filesystem size used avail capacity Mounted on
rpool/ROOT/snv_91 20G 5.9G 7.0G 46% /
/devices 0K 0K 0K 0% /devices
/dev 0K 0K 0K 0% /dev
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 2.3G 416K 2.3G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab /usr/lib/libc/libc_hwcap1.so.1
13G 5.9G 7.0G 46% /lib/libc.so.1
fd 0K 0K 0K 0% /dev/fd
swap 2.3G 7.2M 2.3G 1% /tmp
swap 2.3G 64K 2.3G 1% /var/run
rpool/export 20G 19K 7.0G 1% /export
rpool/export/home 20G 6.8M 7.0G 1% /export/home
rpool 20G 44K 7.0G 1% /rpool
rpool/export/cores 2.0G 18K 2.0G 1% /var/cores
SHARED 61G 24K 31G 1% /shared
... snip ...

And checking the settings on the /var/cores ZFS file system

/ $ zfs get all rpool/export/cores
NAME PROPERTY VALUE SOURCE
rpool/export/cores type filesystem -
rpool/export/cores creation Thu Jun 19 14:18 2008 -
rpool/export/cores used 18K -
rpool/export/cores available 2.00G -
rpool/export/cores referenced 18K -
rpool/export/cores compressratio 1.00x -
rpool/export/cores mounted yes -
rpool/export/cores quota 2G local
rpool/export/cores reservation none default
rpool/export/cores recordsize 128K default
rpool/export/cores mountpoint /var/cores local
rpool/export/cores sharenfs off default
rpool/export/cores checksum on default
rpool/export/cores compression off default
rpool/export/cores atime off inherited from rpool
rpool/export/cores devices on default
rpool/export/cores exec on default
rpool/export/cores setuid on default
rpool/export/cores readonly off default
rpool/export/cores zoned off default
rpool/export/cores snapdir hidden default
rpool/export/cores aclmode groupmask default
rpool/export/cores aclinherit restricted default
rpool/export/cores canmount on default
rpool/export/cores shareiscsi off default
rpool/export/cores xattr on default
rpool/export/cores copies 1 default
rpool/export/cores version 3 -
rpool/export/cores utf8only off -
rpool/export/cores normalization none -
rpool/export/cores casesensitivity sensitive -
rpool/export/cores vscan off default
rpool/export/cores nbmand off default
rpool/export/cores sharesmb off default
rpool/export/cores refquota none default
rpool/export/cores refreservation none default

Note that Access-time updates on this file system is off - the setting has been inherited from the pool. The only "local" settings are the mountpoint and the quota which corresponds to the items that I've specified manually.

Now just to make new core files actually use this directory. At present, the default settings from coreadm looks like this:

/ $ coreadm
global core file pattern:
global core file content: default
init core file pattern: core
init core file content: default
global core dumps: disabled
per-process core dumps: enabled
global setid core dumps: disabled
per-process setid core dumps: disabled
global core dump logging: disabled

Looking at the coreadm man page, there is a fair amount of flexibility in what can be done. I want core files to have a name identifying the zone in which the process was running, the process executable file, and the user. I also don't want core dumps to overwrite when the same process keeps on faulting, so I will add a time stamp to the core file name.

/ $ pfexec coreadm -g /var/core/core.%z.%f.%u.%t

And then I would like to enable logging of an event any time when a core file is generated, and also to store core files for Set-UID processes:

/ $ pfexec coreadm -e global-setid -e log

And finally, just to review the core-dump settings, these now look like this:

/ $ coreadm
global core file pattern: /var/core/core.%z.%f.%u.%t
global core file content: default
init core file pattern: core
init core file content: default
global core dumps: disabled
per-process core dumps: enabled
global setid core dumps: enabled
per-process setid core dumps: disabled
global core dump logging: enabled

Now if that is not useful, I don't know what is! You will soon start to appreciate just how much space is wasted and just how truly rigid and inflexible other file systems are once you run your machine with a ZFS root!




Wednesday, June 4, 2008

Sharing a ZFS pool between Linux and Solaris

If you are multi-booting between Linux and Solaris (and others like FreeBSD, OpenBSD and Mac OS X, I expect) you will sooner or later encounter the problem of how to share disk space between the operating systems. FAT32 is not satisfactory due to its lack of POSIX features, in particular file ownership and access modes, not to mention its sub par performance. ext2/3 is not an option because you only get read-only support for it in Solaris, and similarly UFS enjoys only read-only support in Linux. The whole situation is rather depressing.


Enter ZFS.


This all started because I discovered that I can have a ZFS root file system without having to install OpenSolaris. The trick as some of you may know, is to select "Solaris Express" from the first menu on booting the install disk, and then select one of the two "Interactive Text" options from the next menu. This puts you back into 1984 in terms of installers, but you get the option of using ZFS for root!


Note: It might be possible to do this with the default installer, but on my computer the installer just would not run (I got some daft error about fonts and mouse themes). With a ZFS root, the Swap and Dump automatically goes onto dedicated vdevs, and you save a lot in terms of pre-allocated space.


I have of course used ZFS on my laptop previously as a test, but the benefits were limited by the fact that I still had "slices" for the OS and a small ZFS pool on a spare slice.


I'm not sure which build of Nevada first introduced the ZFS root option in the installer, but it is available in build 90 at least.


My choice of Linux distribution is Ubuntu 8.04. The steps to setting up a ZFS pool shared across operating systems are as follow:


1. Select a Partitioning scheme with minimal space allocated to each of Ubuntu and Nevada.
I decided to put Ubuntu in an Extended partition with a 10 GB Logical Partition for the OS, /var and /home, and a 1 GB Logical partition for Swap.
For Solaris I allocated a 24 GB primary partition to become the ZFS root pool, which includes Swap, Dump, OS and Live-upgrade space.
The balance of the 100 GB disk will be shared between Ubuntu and Solaris using ZFS.


Note: Linux and Solaris has got some different views on how disk partitioning works.
Due to historical reasons, in particular due to compatibility with Solaris on SPARC hardware, Solaris slices live in a single primary partition with an identifier of 0x82 (SOLARIS) or 0xbf (SOLARIS2) somewhat like how logical fdisk partitions live inside an "extended partition".


2. Install Ubuntu first, creating only the partitions for it. Remember to not have any external drives connected as it can screw up the order in which drives are detected and as a result bugger up the Grub menu list.


During the installation you create an Admin user. This will eventually in the future become a "backup" admin user.


3. Reboot and load patches/updates, and backup the Grub /boot/grub/menu.lst file to an external media such as a USB thumb drive for easy access. The Ubuntu Grub does not understand ZFS, so you need to use Nevada's Grub to manage the multi-booting.


4. Also set Ubuntu to use the hardware clock as local time in stead of UTC. (This is what Solaris uses) To do this change UTC=yes to UTC=no in /etc/default/rcS, then reboot.


5. Install Nevada. Use either of the Interactive Text installer options, but for simplicity's sake specify the system as non-networked.


6. Reboot and create a user for every-day use, and add this user to the "Primary Administrator" role using usermod -P "Primary Administrator" <userid>


7. Add the Ubuntu Grub entries you saved in step 3 to the end of the Nevada grub menu.lst file. This will be stored in /boot/grub/menu.lst (The default pool name is rpool)


8. Reboot back into Ubuntu, then follow the Linux ZFS-FUSE instalation instructions to get ZFS-FUSE installed. I used the trunk to get the latest ZFS updates from Opensolaris.org included. Also see this Ubuntu Wiki page, and Ralf Hildebrand's blog for more info.


For reference, this is the procedure I used

apt-get install mercurial build-essential scons libfuse-dev libaio-dev devscripts build-essential zlib1g-dev
cd ~
hg clone http://www.wizy.org/mercurial/zfs-fuse/trunk
cd trunk/src
scons
sudo scons install


9. Create an fdisk partition for the shared ZFS pool using the remaining disk space. I used a primary partition and set the identifier to W95 FAT32, though this is probably unimportant.


10. While still running running Ubuntu, create a ZFS pool on this new fdisk partition using a command like this:

sudo /usr/local/sbin/zfs-fuse
sudo /usr/local/sbin/zpool create -m /shared SHARED


I like to give my ZFS pools names in all-capitals, purely because it makes the ZFS pool devices stand out better in the output from df and mount.


WARNING: I found that if I created the ZFS pool under Solaris, it refused to import into Ubuntu, but if I created it under Linux it imports/exports just fine in both directions. Both pools are created as version 10 pools, so the reason for this is not obvious. If you do decide to experiment with creating the pool under Solaris, when you want to realy get rid of the pool you will discover you need to dd zeros over the pool before creating it again, otherwise the condition remains unchanged despite destroying and re-creating the pool. If you do experiment with this please do share your results!


11. Export the ZFS pool using

/usr/local/sbin/zpool export SHARED


12. Reboot into Nevada and import the pool using

/usr/local/sbin/zpool import SHARED


Note: If you forget to export before you shut down, you will need to add -f to force the import after booting into the other OS.


At this point I just sat there and stared in wonder at how well it actually works. There is beauty in finally seeing this working!


13. Create some init.d / rc scripts to automate the import/export on shutdown/startup.


14. Now you can start customizing both operating environments. You may want to setup Automatic network configuration by enabling the SMF for NWAM in Solaris, eg by doing:

pfexec svcadm disable physical:default
pfexec svcadm enable physical:nwam


I'm looking forward to testing Live Upgrade on my setup with ZFS root, and to getting a shared home directory to work well for both Solaris and Ubuntu. I have created a login ID with the same gid/uid and a home directory under the shared ZFS pool, but after a few changes it got broken under Ubuntu, probably due to subtle differences in how Gnome/Desktop config items are stored and/or expected.


Despite my initial sceptism about FUSE, it is actually quite functional. All-in-all I love being able to share a file system, well, many files systems, between the two operating environments!



Sunday, March 9, 2008

Automating the system identification for a Solaris zone to speed up zone deployment


Recapping, I demonstrated how to create a basic Solaris zone from scratch. Then I showed how to use ZFS snapshots to add the ability to “reset” a zone to a clean state, and how to speed up the definition step by exporting a zone configuration file and then using this as a template for defining zones.


This can save a considerable amount of time with complex zones. The other two steps of creating a zone, namely installing it (populating it with files) and setting it up by completing the system identification during the first boot can also be improved one, the first by using the zoneadm “clone” feature, and the second by using a pre-defined sysidcfg file (and maybe a few other tweaks) injected into the zone file system before it is booted the first time.


This blog entry talks about the second of these.


The sysidcfg file is simply a text file with lines specifying the values for the various options. This file is placed in the zone's /etc directory in its root file system, before it is booted. Then during boot-up, the file is read and any specified values prompted, while any missing items will be prompted for as per normal.


The items that can be set are as follow:


Item

Variable Name

Description of Values

Security Policy

security_policy

Kerberos or NONE. If set to “kerberos”, additional properties can be set. If not specified, a value will be prompted.

Name Service

name_service

NIS, NIS+, LDAP, DNS, NONE. Some additional properties are available when using NIS, NIS+, LDAP or DNS. If not specified, you will be prompted for the appropriate value(s).

NFSv4 Domain Name

nfs4_domain

Specify either the keyword “dynamic”, or provide the value to be used for the NFS4 domain name as a Fully Qualified domain name. If not specified, you will be prompted for the appropriate value(s).

Region and Time zone

timezone

Ether give the time zone from /usr/share/lib/zoneinfo/* or else specify a GMT-offset value. If not specified, you will be prompted for this information.

Terminal Type

terminal

Specify the TERM type, eg vt100. If not specified, you will be prompted for this value.

Locale

system_locale

Specify a locale, eg C, such as found from /usr/lib/locale. If not specified, you will be prompted for this value.

Root Password

root_password

The Encrypted root password. To get this, the easiest is to make a dummy user, set its password to what you want, and then copy the encrypted value from the /etc/shadow file. Other options include writing a little perl script or C program to produce the encrypted version of a password. If not specified, you will be prompted during the first boot.

Network Settings

network_interface

Except for the hostname, these are normally obtained from the zone definition. It can be specified here to override those values, but will not be prompted if not specified.


Note: It is entirely possible to use sys-unconfig in a zone. Doing so will have a similar effect to running sys-unconfig on a global zone or normal Solaris system: The zone will halt and on the next boot you will be presented with prompts for the system identification items. Be Aware that sys-unconfig also removes the zone's root key, and a new one will be generated on the first boot after the system was un-configured.


Something else to note is that a zone's “hostname” and “zone name” does not have to be the same. If you do keep it the same, there will be less opportunity for confusion. While the other network settings for a zone is obtained automatically from the zone's definition, the hostname will still be prompted. To eliminate this prompt, include a network settings section in the zone's sysidcfg file.


Some items available in the sysidcfg file for a normal system can not be set during a zone's system identification as it relies on configuring the kernel and a zone does not have its own dedicated kernel. These include items like power management and the Date and Time, including a Time-server.


An example of a basic sysidcfg file might look like this:


bash-3.2# cat sysidcfg

nfs4_domain=dynamic

security_policy=NONE

timezone=Africa/Johannesburg

terminal=vt100

system_locale=C

name_service=NONE

network_interface=PRIMARY {hostname=ziggy.mydomain}


In the above example the keyword PRIMARY is used to automatically select the only interface configured on this zone. This effectively allows for setting the zone name in the sysidcfg template with minimal fuss. It is of course also possible to use the interface name.


If any of the options are omitted from the file, those items will be prompted for in the usual manner. I did not specify the root login password, so that will be the only item which will be prompted for during the boot up process.


To test this, do the following:

  1. Define the zone (using zonecfg)

  2. Install the zone (using zoneadm -z zonename install

  3. Copy the sysidcfg file to the zone's etc file, eg
    cp sysidcfg.template /export/zones/zonename/root/etc/sysidcfg

  4. Boot the zone and connect to its console, eg
    zoneadm -z zonename boot; zlogin -C zonename


And voila! Now you can automate the zone definition and the zone's system identification. In the next part I'll show how to speed up the Installation step.






Saturday, February 23, 2008

First steps to cloning a Solaris Zone

Today I want to just mention a few concepts that I've been deliberately neglecting - some of the ideas I mentioned in my 15-minute-to-your-first-zone guide and in my post about Automating Zone creation, and that I will still be posting about in the next few posts are based on this.

Firstly Sun has cleverly integrated Zone management with the ZFS file system.

1. If the parent directory of a Zone's root is on a ZFS file system, then zoneadm will create a new ZFS file system for the rootpath of the zone.

2. Stopping and starting the zone will mount and unmount the zone's root file system

3. Cloning a zone by means of the zoneadm utility will automatically use a ZFS snapshot to create a ZFS clone which will be mounted on the rootpath of the new zone.

This is even more interesting because prior to release 11/06 of Solaris 10, running a zone with its rootpath on a ZFS file system was an unsupported configuration.

The second thing is that resources which are only needed while a zone is running, is created and destroyed dynamically when the zone is started or halted. In particular this applies to network interfaces and loopback file systems. When you start up a zone, you will notice new entries were created for its interfaces, and new file systems were mounted. The file systems which are so managed are normally hidden from df in the global zone, but shows up when you run df with the new -Z switch.

The next concept is that of how zlogin works. You can think of zlogin as a kind of "su" command, but in stead of running a commands under a different userid, it runs a command in a different zone. The default command which it runs is a shell. You can also compare is to using ssh or rexec to run a specified command somewhere else, though there is no networking involved.


The concept that every process has got a UID and GID which controls its access to files and system calls in extended in Solaris by a new field storing the process' Zone-ID. This, together with Process Permission flags and a chroot is essentially what zones are, but more on that later.


Using zlogin to run a command or create a new shell in a zone will create a wtmpx login record having zone:global as the origin of the session.

zlogin with the -C option depends on the zone's console device, and creates a wtmpx entry in the zone with the console recorded as the origin of the login session.

A few things which I consider to be good Zone management habbits:

1. Keep an entry for each zone's IP address in the global zone's hosts file, and maintain the /etc/inet/netmasks file with entries for all the subnets you will be using.

2. If you put each zone in its own file system then they can not all "fill up" at the same time. With ZFS file systems, this requires that you set quotas and/or reservations on the zones' root file systems.

Finally a very simple yet important and eventually very powerful concept, namely exporting and importing of zone configurations.

I want to demonstate this using as an example the configuration from an existing zone. First we export it and store it in a text file, like this

globalzone # zonecfg -z myfirstzone export > /tmp/zone_config.txt

Have a look at the file ...

globalzone # cat /tmp/zone_config.txt

create -b

set zonepath=/export/zones/myfirstzone

set autoboot=false

set ip-type=shared

add net

set address=192.168.100.131

set physical=e1000g0

end

Now just make a few small changes to the file - specifically we update the zonepath and the IP address


globalzone # sed '

/zonepath/ s/myfirstzone/firstclone/;

/address=/ s/100.131/100.132/

' /tmp/zone_config.txt > /tmp/clone_config.txt

Of course you could use your favourite text editor to do that, but using sed is just so sexy.

globalzone # cat /tmp/clone_config.txt

create -b

set zonepath=/export/home/zones/firstclone

set autoboot=false

set ip-type=shared

add net

set address=192.168.100.132

set physical=e1000g0

end

We will feed this config file into zonecfg. Of course you could just manually set each of those entries, but zone configurations can easily get complex - you may have multiple network interfaces, many file systems, and several other non-default settings like resource controls, something I'll get to in due course.

Right now what I have on my system (the "before" picture)

globalzone # zoneadm list -vc

ID NAME STATUS PATH BRAND IP

0 global running / native shared

- myfirstzone installed /export/zones/myfirstzone native shared

- disposable installed /export/zones/disposable native shared

Creating the zone config based on this:

globalzone # zonecfg -z firstclone -f /tmp/zone_config.txt

Then the "after" picture showing the new zone configured...

globalzone # zoneadm list -vc

ID NAME STATUS PATH BRAND IP

0 global running / native shared

- myfirstzone installed /export/zones/myfirstzone native shared

- disposable installed /export/zones/disposable native shared

- firstclone configured /export/zones/firstclone native shared

All that remains is to populate this new "cloned" zone with user-land bits...

globalzone # timex zoneadm -z firstclone install

A ZFS file system has been created for this zone.

Preparing to install zone <firstclone>.

Creating list of files to copy from the global zone.

Copying <188162> files to the zone.

Initializing zone product registry.

Determining zone package initialization order.

Preparing to initialize <1307> packages on the zone.

Initialized <1307> packages on zone.

Zone <firstclone> is initialized.

Installation of <1> packages was skipped.

The file </export/zones/firstclone/root/var/sadm/system/logs/install_log> contains a log of the zone installation.

real 28:13.55

user 4:46.13

sys 7:18.22

And we're done!


Over 28 minutes – that is still much to slow. In the next post I will use this concept and add to it to take “cloning” to the next level - It should not take more than a few seconds.

Thursday, February 21, 2008

Automating Solaris Zone creation

Zones can be treated as cheap, disposable application containers. Automated Zone creation is not necessarily there to allow you to rapidly deploy 1000s of Zones (though it could certainly be used for this purpose given sufficient planning), but allows you to create and use, then delete and easily re-create zones freshly and with a consistent configuration.

You will find that most, if not all, of your zones will use the same naming-services configuration, be in the same time-zone, attach to the same network interface (just with different IP addresses), etc. Many of the System Identification and system configuration settings will be identical or very similar between the Zones.

You might even find that with each new zone you create the same set of user-ids and have them all get their home directories from a central home-directory server. Basically repeat work. Computers are, in fact, good at repeatedly doing the same task over and over, without getting bored.

If all you want to achieve is to have a clean state to which you can restore a zone easily, then a fine plan would be to use file system snapshots, something like this:

1. Preparation / Setup

1.1. Create a file system structure in which to store the Zone. Since we've got ZFS for free with Solaris there's really no reason not to use it.

1.2. Set up the Zone in this file system, and complete the configuration up to the point where you want to be able to revert back to.

1.3. Shut down the Zone and take a snapshot.

2. Using this Zone:

2.1 Make any instance specific "custom" configuration changes (add some disk space, user-ids, tweak some settings)

2.2 Start the zone and let the users loose in it.

3. Reverting to the clean status

3.1 Bring the zone down (purely to make sure that no processes have files open in the file system containing the zone)

3.2 Recover the file system back to the Snap-shot state.

3.3 Go back to nr 2 above.

Before I show an example of doing this using ZFS, suffer me to mention the other techniques involved in automating Solaris Zone creation (Each of which I will cover in a separate blog post in detail)

Firstly copying the Zone configuration. This involves creating a zone config and exporting it to a file to be used as a template in the future. Then each time you want to create a zone based on this template, you just make a few small changes such as the zone-name and IP address, then import this modified copy of the template into a new zone, after which you continue with the normal zone installation.

Using a sysidcfg file and a few other tricks to speed up the zone configuration is quite similar to using a sysidcfg file to pre-configure a system from a jumpstart, and can by used to automate settings such as the timezone, locale, terminal type, networking, and name-services, amongst others.

Cloning Zones to speed up the install process. The Zone management framework from Sun gives us the ability to "clone" a master "template" zone. This involves creating one (or more) template zones which you then leave fully installed and configured, but don't actually ever start up or use, other than to tweak their configurations. This saves time during the actual install and subsequent configuration steps.

With that out of the way, on to the example of how to make a simple disposable Zone. As always the fixed-width text represents what you should see on the screen. I highlight the bits you enter.

globalzone# zpool create SPACE c0d0s4

globalzone# zfs create SPACE/zones

globalzone# zfs set mountpoint=/export/zones SPACE/zones

globalzone# zfs create SPACE/zones/disposable

globalzone# chmod 0700 /export/zones/disposable

globalzone# zfs set atime=off SPACE/zones/disposable

Disabling of “atime” above is a personal preference thing. Now we set up a simple zone. Yours can be as complicated or as simple as you want it to be.

globalzone# zonecfg -z disposable

zonecfg:disposable> set zonepath=/export/zones/disposable

zonecfg:disposable> add net

zonecfg:disposable:net> set physical=e1000g0

zonecfg:disposable:net> set address=192.168.24.133

zonecfg:disposable:net> end

zonecfg:disposable> verify

zonecfg:disposable> commit

zonecfg:disposable> exit

globalzone# zoneadm -z disposable install

cannot create ZFS dataset SPACE/zones/disposable: dataset already exists

Preparing to install zone .

Creating list of files to copy from the global zone.

Copying <9386> files to the zone.

Initializing zone product registry.

Determining zone package initialization order.

Preparing to initialize <1307> packages on the zone.

Initialized <1307> packages on zone.

Zone is initialized.

Installation of <1> packages was skipped.

Installation of these packages generated warnings:

The file contains a log of the zone installation.

For the eagle-eyed amongst you, the WebStackTooling failure is due to the fact that this is a sparse zone and I'm running beta software (Nevada Build 80). In a sparse zone the /usr file system is read-only and The WebStackTooling is trying to create or change some files. I'm just ignoring this error for now as it does not bother me.

So far, so good. Lets save a backup of what we've got so far.

globalzone# zfs snapshot SPACE/zones/disposable@freshly_installed

Now we perform the first boot and system identification. Below is an abbreviated copy-paste showing the flow of the process.

globalzone# zoneadm -z disposable boot; zlogin -C disposable

[Connected to zone 'disposable' console]

Configuring Services ... 150/150

Reading ZFS config: done.

>>> Select a Language

>>> Select a Locale

>>> What type of terminal are you using?

Creating new rsa public/private host key pair

Creating new dsa public/private host key pair

Configuring network interface addresses: e1000g0.

>>> Host name for e1000g0:1 disposable

>>> Configure Security Policy:

>>> Name Service

>>> NFSv4 Domain Name:

>>> Region and Time zone: Africa/Johannesburg

>>> Root Password

System identification is completed.

rebooting system due to change(s) in /etc/default/init

[NOTICE: Zone rebooting]

SunOS Release 5.11 Version snv_80 64-bit

Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.

Use is subject to license terms.

Hostname: disposable

Reading ZFS config: done.

disposable console login: root

Password:

Feb 20 21:33:05 disposable login: ROOT LOGIN /dev/console

Sun Microsystems Inc. SunOS 5.11 snv_80 January 2008

You may want to make a few more changes now that the Zone is running. Some ideas may be to set up User-IDs, enable/disable some services, and set up some NFS and/or autmounter file systems.

# mkdir /export/home

# useradd -c "Joe Blogs" -d /export/home/joeblogs -m joeblogs

# passwd joeblogs

Assuming you've done all you want, this is the point where we have a cleanly built zone, running, and essentially the point that we would like to be able to return to after we did whatever make-and-break or sandbox testing. The Zone should be halted before we take the snapshot, even if only to close all open files.

# halt

Feb 20 21:33:12 disposable halt: initiated by root on /dev/console

Feb 20 21:33:12 disposable syslogd: going down on signal 15

[NOTICE: Zone halted]

~.

[Connection to zone 'disposable' console closed]

Now just take another ZFS snapshot:

globalzone# zfs snapshot SPACE/zones/disposable@system_identified

=================

Now the Zone is ready for you to let your users loose in it. Allow them to have full root access, go crazy, run "rm -r /", etc.

globalzone# zoneadm -z disposable boot; zlogin -C disposable

zoneadm: zone 'disposable': WARNING: e1000g0:1: no matching subnet found in netmasks(4) for 192.168.24.133; using default of 255.255.255.0.

[Connected to zone 'disposable' console]

Hostname: disposable

Reading ZFS config: done.

disposable console login: root

Password:

Feb 20 21:40:11 disposable login: ROOT LOGIN /dev/console

Last login: Wed Feb 20 21:33:05 on console

Sun Microsystems Inc. SunOS 5.11 snv_80 January 2008

Now perform some "work" - Create a few directories, modify some files, etc. I chose to run sys-unconfig.

# sys-unconfig

WARNING

This program will unconfigure your system. It will cause it

to revert to a "blank" system - it will not have a name or know

about other systems or networks.

This program will also halt the system.

Do you want to continue (y/n) ? y

sys-unconfig started Wed Feb 20 21:40:30 2008

sys-unconfig completed Wed Feb 20 21:40:30 2008

Halting system...

svc.startd: The system is coming down. Please wait.

svc.startd: 59 system services are now being stopped.

svc.startd: The system is down.

[NOTICE: Zone halted]

Then, back in the global zone, examine the available ZFS snapshots:

globalzone# zfs list

NAME USED AVAIL REFER MOUNTPOINT

SPACE 684M 14.1G 18K /SPACE

SPACE/zones 684M 14.1G 19K /export/zones

SPACE/zones/disposable 684M 14.1G 624M /export/zones/disposable

SPACE/zones/disposable@freshly_installed 790K - 523M -

SPACE/zones/disposable@system_identified 59.2M - 611M -

These four commands can go nicely into a little "revert" script.

globalzone# zfs clone SPACE/zones/disposable@system_identified \

SPACE/zones/reverted_temp

globalzone# zfs promote SPACE/zones/reverted_temp

globalzone# zfs destroy SPACE/zones/disposable

globalzone# zfs rename SPACE/zones/reverted_temp SPACE/zones/disposable

That took just a few seconds, and we are ready to start using the zone again...

global# zoneadm -z disposable boot; zlogin -C disposable

[Connected to zone 'disposable' console]

SunOS Release 5.11 Version snv_80 64-bit

Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.

Use is subject to license terms.

Hostname: disposable

Reading ZFS config: done.

disposable console login:

As expected, you will find that all changes are reverted. Besides the normal application test environment, one other area where I think this would be quite handy is in a class-room situation, where you can allow the students full root access in the zone, and at the end of the day quickly recover the system to a sane state for the next day's class.

All in all that was Q-E-D. This principle, as well as the information from my previous blog posting will form the basis of the next few posts.