Showing posts with label snapshots. Show all posts
Showing posts with label snapshots. Show all posts

Wednesday, June 4, 2008

Sharing a ZFS pool between Linux and Solaris

If you are multi-booting between Linux and Solaris (and others like FreeBSD, OpenBSD and Mac OS X, I expect) you will sooner or later encounter the problem of how to share disk space between the operating systems. FAT32 is not satisfactory due to its lack of POSIX features, in particular file ownership and access modes, not to mention its sub par performance. ext2/3 is not an option because you only get read-only support for it in Solaris, and similarly UFS enjoys only read-only support in Linux. The whole situation is rather depressing.


Enter ZFS.


This all started because I discovered that I can have a ZFS root file system without having to install OpenSolaris. The trick as some of you may know, is to select "Solaris Express" from the first menu on booting the install disk, and then select one of the two "Interactive Text" options from the next menu. This puts you back into 1984 in terms of installers, but you get the option of using ZFS for root!


Note: It might be possible to do this with the default installer, but on my computer the installer just would not run (I got some daft error about fonts and mouse themes). With a ZFS root, the Swap and Dump automatically goes onto dedicated vdevs, and you save a lot in terms of pre-allocated space.


I have of course used ZFS on my laptop previously as a test, but the benefits were limited by the fact that I still had "slices" for the OS and a small ZFS pool on a spare slice.


I'm not sure which build of Nevada first introduced the ZFS root option in the installer, but it is available in build 90 at least.


My choice of Linux distribution is Ubuntu 8.04. The steps to setting up a ZFS pool shared across operating systems are as follow:


1. Select a Partitioning scheme with minimal space allocated to each of Ubuntu and Nevada.
I decided to put Ubuntu in an Extended partition with a 10 GB Logical Partition for the OS, /var and /home, and a 1 GB Logical partition for Swap.
For Solaris I allocated a 24 GB primary partition to become the ZFS root pool, which includes Swap, Dump, OS and Live-upgrade space.
The balance of the 100 GB disk will be shared between Ubuntu and Solaris using ZFS.


Note: Linux and Solaris has got some different views on how disk partitioning works.
Due to historical reasons, in particular due to compatibility with Solaris on SPARC hardware, Solaris slices live in a single primary partition with an identifier of 0x82 (SOLARIS) or 0xbf (SOLARIS2) somewhat like how logical fdisk partitions live inside an "extended partition".


2. Install Ubuntu first, creating only the partitions for it. Remember to not have any external drives connected as it can screw up the order in which drives are detected and as a result bugger up the Grub menu list.


During the installation you create an Admin user. This will eventually in the future become a "backup" admin user.


3. Reboot and load patches/updates, and backup the Grub /boot/grub/menu.lst file to an external media such as a USB thumb drive for easy access. The Ubuntu Grub does not understand ZFS, so you need to use Nevada's Grub to manage the multi-booting.


4. Also set Ubuntu to use the hardware clock as local time in stead of UTC. (This is what Solaris uses) To do this change UTC=yes to UTC=no in /etc/default/rcS, then reboot.


5. Install Nevada. Use either of the Interactive Text installer options, but for simplicity's sake specify the system as non-networked.


6. Reboot and create a user for every-day use, and add this user to the "Primary Administrator" role using usermod -P "Primary Administrator" <userid>


7. Add the Ubuntu Grub entries you saved in step 3 to the end of the Nevada grub menu.lst file. This will be stored in /boot/grub/menu.lst (The default pool name is rpool)


8. Reboot back into Ubuntu, then follow the Linux ZFS-FUSE instalation instructions to get ZFS-FUSE installed. I used the trunk to get the latest ZFS updates from Opensolaris.org included. Also see this Ubuntu Wiki page, and Ralf Hildebrand's blog for more info.


For reference, this is the procedure I used

apt-get install mercurial build-essential scons libfuse-dev libaio-dev devscripts build-essential zlib1g-dev
cd ~
hg clone http://www.wizy.org/mercurial/zfs-fuse/trunk
cd trunk/src
scons
sudo scons install


9. Create an fdisk partition for the shared ZFS pool using the remaining disk space. I used a primary partition and set the identifier to W95 FAT32, though this is probably unimportant.


10. While still running running Ubuntu, create a ZFS pool on this new fdisk partition using a command like this:

sudo /usr/local/sbin/zfs-fuse
sudo /usr/local/sbin/zpool create -m /shared SHARED


I like to give my ZFS pools names in all-capitals, purely because it makes the ZFS pool devices stand out better in the output from df and mount.


WARNING: I found that if I created the ZFS pool under Solaris, it refused to import into Ubuntu, but if I created it under Linux it imports/exports just fine in both directions. Both pools are created as version 10 pools, so the reason for this is not obvious. If you do decide to experiment with creating the pool under Solaris, when you want to realy get rid of the pool you will discover you need to dd zeros over the pool before creating it again, otherwise the condition remains unchanged despite destroying and re-creating the pool. If you do experiment with this please do share your results!


11. Export the ZFS pool using

/usr/local/sbin/zpool export SHARED


12. Reboot into Nevada and import the pool using

/usr/local/sbin/zpool import SHARED


Note: If you forget to export before you shut down, you will need to add -f to force the import after booting into the other OS.


At this point I just sat there and stared in wonder at how well it actually works. There is beauty in finally seeing this working!


13. Create some init.d / rc scripts to automate the import/export on shutdown/startup.


14. Now you can start customizing both operating environments. You may want to setup Automatic network configuration by enabling the SMF for NWAM in Solaris, eg by doing:

pfexec svcadm disable physical:default
pfexec svcadm enable physical:nwam


I'm looking forward to testing Live Upgrade on my setup with ZFS root, and to getting a shared home directory to work well for both Solaris and Ubuntu. I have created a login ID with the same gid/uid and a home directory under the shared ZFS pool, but after a few changes it got broken under Ubuntu, probably due to subtle differences in how Gnome/Desktop config items are stored and/or expected.


Despite my initial sceptism about FUSE, it is actually quite functional. All-in-all I love being able to share a file system, well, many files systems, between the two operating environments!



Sunday, March 9, 2008

Automating the system identification for a Solaris zone to speed up zone deployment


Recapping, I demonstrated how to create a basic Solaris zone from scratch. Then I showed how to use ZFS snapshots to add the ability to “reset” a zone to a clean state, and how to speed up the definition step by exporting a zone configuration file and then using this as a template for defining zones.


This can save a considerable amount of time with complex zones. The other two steps of creating a zone, namely installing it (populating it with files) and setting it up by completing the system identification during the first boot can also be improved one, the first by using the zoneadm “clone” feature, and the second by using a pre-defined sysidcfg file (and maybe a few other tweaks) injected into the zone file system before it is booted the first time.


This blog entry talks about the second of these.


The sysidcfg file is simply a text file with lines specifying the values for the various options. This file is placed in the zone's /etc directory in its root file system, before it is booted. Then during boot-up, the file is read and any specified values prompted, while any missing items will be prompted for as per normal.


The items that can be set are as follow:


Item

Variable Name

Description of Values

Security Policy

security_policy

Kerberos or NONE. If set to “kerberos”, additional properties can be set. If not specified, a value will be prompted.

Name Service

name_service

NIS, NIS+, LDAP, DNS, NONE. Some additional properties are available when using NIS, NIS+, LDAP or DNS. If not specified, you will be prompted for the appropriate value(s).

NFSv4 Domain Name

nfs4_domain

Specify either the keyword “dynamic”, or provide the value to be used for the NFS4 domain name as a Fully Qualified domain name. If not specified, you will be prompted for the appropriate value(s).

Region and Time zone

timezone

Ether give the time zone from /usr/share/lib/zoneinfo/* or else specify a GMT-offset value. If not specified, you will be prompted for this information.

Terminal Type

terminal

Specify the TERM type, eg vt100. If not specified, you will be prompted for this value.

Locale

system_locale

Specify a locale, eg C, such as found from /usr/lib/locale. If not specified, you will be prompted for this value.

Root Password

root_password

The Encrypted root password. To get this, the easiest is to make a dummy user, set its password to what you want, and then copy the encrypted value from the /etc/shadow file. Other options include writing a little perl script or C program to produce the encrypted version of a password. If not specified, you will be prompted during the first boot.

Network Settings

network_interface

Except for the hostname, these are normally obtained from the zone definition. It can be specified here to override those values, but will not be prompted if not specified.


Note: It is entirely possible to use sys-unconfig in a zone. Doing so will have a similar effect to running sys-unconfig on a global zone or normal Solaris system: The zone will halt and on the next boot you will be presented with prompts for the system identification items. Be Aware that sys-unconfig also removes the zone's root key, and a new one will be generated on the first boot after the system was un-configured.


Something else to note is that a zone's “hostname” and “zone name” does not have to be the same. If you do keep it the same, there will be less opportunity for confusion. While the other network settings for a zone is obtained automatically from the zone's definition, the hostname will still be prompted. To eliminate this prompt, include a network settings section in the zone's sysidcfg file.


Some items available in the sysidcfg file for a normal system can not be set during a zone's system identification as it relies on configuring the kernel and a zone does not have its own dedicated kernel. These include items like power management and the Date and Time, including a Time-server.


An example of a basic sysidcfg file might look like this:


bash-3.2# cat sysidcfg

nfs4_domain=dynamic

security_policy=NONE

timezone=Africa/Johannesburg

terminal=vt100

system_locale=C

name_service=NONE

network_interface=PRIMARY {hostname=ziggy.mydomain}


In the above example the keyword PRIMARY is used to automatically select the only interface configured on this zone. This effectively allows for setting the zone name in the sysidcfg template with minimal fuss. It is of course also possible to use the interface name.


If any of the options are omitted from the file, those items will be prompted for in the usual manner. I did not specify the root login password, so that will be the only item which will be prompted for during the boot up process.


To test this, do the following:

  1. Define the zone (using zonecfg)

  2. Install the zone (using zoneadm -z zonename install

  3. Copy the sysidcfg file to the zone's etc file, eg
    cp sysidcfg.template /export/zones/zonename/root/etc/sysidcfg

  4. Boot the zone and connect to its console, eg
    zoneadm -z zonename boot; zlogin -C zonename


And voila! Now you can automate the zone definition and the zone's system identification. In the next part I'll show how to speed up the Installation step.






Thursday, February 21, 2008

Automating Solaris Zone creation

Zones can be treated as cheap, disposable application containers. Automated Zone creation is not necessarily there to allow you to rapidly deploy 1000s of Zones (though it could certainly be used for this purpose given sufficient planning), but allows you to create and use, then delete and easily re-create zones freshly and with a consistent configuration.

You will find that most, if not all, of your zones will use the same naming-services configuration, be in the same time-zone, attach to the same network interface (just with different IP addresses), etc. Many of the System Identification and system configuration settings will be identical or very similar between the Zones.

You might even find that with each new zone you create the same set of user-ids and have them all get their home directories from a central home-directory server. Basically repeat work. Computers are, in fact, good at repeatedly doing the same task over and over, without getting bored.

If all you want to achieve is to have a clean state to which you can restore a zone easily, then a fine plan would be to use file system snapshots, something like this:

1. Preparation / Setup

1.1. Create a file system structure in which to store the Zone. Since we've got ZFS for free with Solaris there's really no reason not to use it.

1.2. Set up the Zone in this file system, and complete the configuration up to the point where you want to be able to revert back to.

1.3. Shut down the Zone and take a snapshot.

2. Using this Zone:

2.1 Make any instance specific "custom" configuration changes (add some disk space, user-ids, tweak some settings)

2.2 Start the zone and let the users loose in it.

3. Reverting to the clean status

3.1 Bring the zone down (purely to make sure that no processes have files open in the file system containing the zone)

3.2 Recover the file system back to the Snap-shot state.

3.3 Go back to nr 2 above.

Before I show an example of doing this using ZFS, suffer me to mention the other techniques involved in automating Solaris Zone creation (Each of which I will cover in a separate blog post in detail)

Firstly copying the Zone configuration. This involves creating a zone config and exporting it to a file to be used as a template in the future. Then each time you want to create a zone based on this template, you just make a few small changes such as the zone-name and IP address, then import this modified copy of the template into a new zone, after which you continue with the normal zone installation.

Using a sysidcfg file and a few other tricks to speed up the zone configuration is quite similar to using a sysidcfg file to pre-configure a system from a jumpstart, and can by used to automate settings such as the timezone, locale, terminal type, networking, and name-services, amongst others.

Cloning Zones to speed up the install process. The Zone management framework from Sun gives us the ability to "clone" a master "template" zone. This involves creating one (or more) template zones which you then leave fully installed and configured, but don't actually ever start up or use, other than to tweak their configurations. This saves time during the actual install and subsequent configuration steps.

With that out of the way, on to the example of how to make a simple disposable Zone. As always the fixed-width text represents what you should see on the screen. I highlight the bits you enter.

globalzone# zpool create SPACE c0d0s4

globalzone# zfs create SPACE/zones

globalzone# zfs set mountpoint=/export/zones SPACE/zones

globalzone# zfs create SPACE/zones/disposable

globalzone# chmod 0700 /export/zones/disposable

globalzone# zfs set atime=off SPACE/zones/disposable

Disabling of “atime” above is a personal preference thing. Now we set up a simple zone. Yours can be as complicated or as simple as you want it to be.

globalzone# zonecfg -z disposable

zonecfg:disposable> set zonepath=/export/zones/disposable

zonecfg:disposable> add net

zonecfg:disposable:net> set physical=e1000g0

zonecfg:disposable:net> set address=192.168.24.133

zonecfg:disposable:net> end

zonecfg:disposable> verify

zonecfg:disposable> commit

zonecfg:disposable> exit

globalzone# zoneadm -z disposable install

cannot create ZFS dataset SPACE/zones/disposable: dataset already exists

Preparing to install zone .

Creating list of files to copy from the global zone.

Copying <9386> files to the zone.

Initializing zone product registry.

Determining zone package initialization order.

Preparing to initialize <1307> packages on the zone.

Initialized <1307> packages on zone.

Zone is initialized.

Installation of <1> packages was skipped.

Installation of these packages generated warnings:

The file contains a log of the zone installation.

For the eagle-eyed amongst you, the WebStackTooling failure is due to the fact that this is a sparse zone and I'm running beta software (Nevada Build 80). In a sparse zone the /usr file system is read-only and The WebStackTooling is trying to create or change some files. I'm just ignoring this error for now as it does not bother me.

So far, so good. Lets save a backup of what we've got so far.

globalzone# zfs snapshot SPACE/zones/disposable@freshly_installed

Now we perform the first boot and system identification. Below is an abbreviated copy-paste showing the flow of the process.

globalzone# zoneadm -z disposable boot; zlogin -C disposable

[Connected to zone 'disposable' console]

Configuring Services ... 150/150

Reading ZFS config: done.

>>> Select a Language

>>> Select a Locale

>>> What type of terminal are you using?

Creating new rsa public/private host key pair

Creating new dsa public/private host key pair

Configuring network interface addresses: e1000g0.

>>> Host name for e1000g0:1 disposable

>>> Configure Security Policy:

>>> Name Service

>>> NFSv4 Domain Name:

>>> Region and Time zone: Africa/Johannesburg

>>> Root Password

System identification is completed.

rebooting system due to change(s) in /etc/default/init

[NOTICE: Zone rebooting]

SunOS Release 5.11 Version snv_80 64-bit

Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.

Use is subject to license terms.

Hostname: disposable

Reading ZFS config: done.

disposable console login: root

Password:

Feb 20 21:33:05 disposable login: ROOT LOGIN /dev/console

Sun Microsystems Inc. SunOS 5.11 snv_80 January 2008

You may want to make a few more changes now that the Zone is running. Some ideas may be to set up User-IDs, enable/disable some services, and set up some NFS and/or autmounter file systems.

# mkdir /export/home

# useradd -c "Joe Blogs" -d /export/home/joeblogs -m joeblogs

# passwd joeblogs

Assuming you've done all you want, this is the point where we have a cleanly built zone, running, and essentially the point that we would like to be able to return to after we did whatever make-and-break or sandbox testing. The Zone should be halted before we take the snapshot, even if only to close all open files.

# halt

Feb 20 21:33:12 disposable halt: initiated by root on /dev/console

Feb 20 21:33:12 disposable syslogd: going down on signal 15

[NOTICE: Zone halted]

~.

[Connection to zone 'disposable' console closed]

Now just take another ZFS snapshot:

globalzone# zfs snapshot SPACE/zones/disposable@system_identified

=================

Now the Zone is ready for you to let your users loose in it. Allow them to have full root access, go crazy, run "rm -r /", etc.

globalzone# zoneadm -z disposable boot; zlogin -C disposable

zoneadm: zone 'disposable': WARNING: e1000g0:1: no matching subnet found in netmasks(4) for 192.168.24.133; using default of 255.255.255.0.

[Connected to zone 'disposable' console]

Hostname: disposable

Reading ZFS config: done.

disposable console login: root

Password:

Feb 20 21:40:11 disposable login: ROOT LOGIN /dev/console

Last login: Wed Feb 20 21:33:05 on console

Sun Microsystems Inc. SunOS 5.11 snv_80 January 2008

Now perform some "work" - Create a few directories, modify some files, etc. I chose to run sys-unconfig.

# sys-unconfig

WARNING

This program will unconfigure your system. It will cause it

to revert to a "blank" system - it will not have a name or know

about other systems or networks.

This program will also halt the system.

Do you want to continue (y/n) ? y

sys-unconfig started Wed Feb 20 21:40:30 2008

sys-unconfig completed Wed Feb 20 21:40:30 2008

Halting system...

svc.startd: The system is coming down. Please wait.

svc.startd: 59 system services are now being stopped.

svc.startd: The system is down.

[NOTICE: Zone halted]

Then, back in the global zone, examine the available ZFS snapshots:

globalzone# zfs list

NAME USED AVAIL REFER MOUNTPOINT

SPACE 684M 14.1G 18K /SPACE

SPACE/zones 684M 14.1G 19K /export/zones

SPACE/zones/disposable 684M 14.1G 624M /export/zones/disposable

SPACE/zones/disposable@freshly_installed 790K - 523M -

SPACE/zones/disposable@system_identified 59.2M - 611M -

These four commands can go nicely into a little "revert" script.

globalzone# zfs clone SPACE/zones/disposable@system_identified \

SPACE/zones/reverted_temp

globalzone# zfs promote SPACE/zones/reverted_temp

globalzone# zfs destroy SPACE/zones/disposable

globalzone# zfs rename SPACE/zones/reverted_temp SPACE/zones/disposable

That took just a few seconds, and we are ready to start using the zone again...

global# zoneadm -z disposable boot; zlogin -C disposable

[Connected to zone 'disposable' console]

SunOS Release 5.11 Version snv_80 64-bit

Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.

Use is subject to license terms.

Hostname: disposable

Reading ZFS config: done.

disposable console login:

As expected, you will find that all changes are reverted. Besides the normal application test environment, one other area where I think this would be quite handy is in a class-room situation, where you can allow the students full root access in the zone, and at the end of the day quickly recover the system to a sane state for the next day's class.

All in all that was Q-E-D. This principle, as well as the information from my previous blog posting will form the basis of the next few posts.