Skip to content

SUT Operations

This tutorial demonstrates working with test machines ('System Under Test' or SUT) in Crucible. SUT operations covers remotely rebooting a machine, patching and installing its kernel, viewing the console logs, manually running tests, and tips for troubleshooting common problems.

Contents:

  1. Logging into the Crucible Driver
  2. Requeuing a Test Run
  3. Locking a SUT
  4. Remotely Power Cycling a SUT
  5. Logging into a SUT
  6. Rebuilding and Reinstalling the Kernel
  7. Booting the SUT to a Given Kernel
  8. Watching a SUT's Console
  9. Starting Test-Specific Services

Logging into the Crucible Driver

SUT operations are performed on the Driver system. This system schedules test runs and provides other services to the SUTs. You can perform various operations on SUTs from this machine, as well as SSHing into the SUTs themselves from here.

In order to access the Driver system, you will need to have a login account and your SSH public key on the Driver. If you don't have one or think there may be a problem with your account, contact the Crucible administrator.

If your account is set up correctly, you can log in through ssh. E.g.:

  $ ssh my_name@crucible.osdl.org

Crucible's tools are generally installed into a special directory, such as /testing/usr/bin. Check that this is included in your $PATH variable. A quick way to do this is the command:

  $ which sut
  /testing/usr/bin/sut

Most sut operations require that you have sudo access; if you ever get permission denied errors, that's most likely what's going on.

Requeuing a Test Run

If you are investigating an issue found during a previous test run, you may find it worthwhile to rerun the test. This will both verify the issue still exists on the system, and will get the right environment set up for you to investigate.

There are two ways to requeue a test. If you know the Test Run ID, you can requeue it from the test driver:

  $ testrun requeue 510

Alternatively, you can queue it using the package name and patch or software name:

  $ queue_package
  nfsv4/linux-2.6.17-rc5-g1a2098e-server-cluster-locking-api.diff

You can look in /testing/packages/ for available packages and patches.

Watch `sut status` and `testrun status` to see what got queued, and when it begins to run on the SUTs.

Note that in both of the two cases, it is possible that additional test runs will be queued up. If you like, you can cancel the unneeded runs like this:

  $ testrun cancel 1234

While it is running a test, you can review its progress via this command:

  $ testrun info 1232

Locking a SUT

To take a SUT out of service and make it stop running tests, use the sut script.

  $ sudo /testing/usr/bin/sut lock nfs04

You can review the state of all the machines like this:

  $ sut status

  SUT               RUN       STATE     PKG   
  amd01             947       finished  patch-2.6.17-git1.bz2
  ita01                       unknown   unknown
  nfs02             932       running   linux-2.6.17-g9eb516f-nfs-server-stable.diff
  nfs03             932       running   linux-2.6.17-g9eb516f-nfs-server-stable.diff
  nfs04             LOCK      unknown   unknown
  nfs05             945       finished  linux-2.6.17-rc1-CITI_NFS4_ALL-1.diff
  nfs06             891       finished  linux-2.6.17-g4bee93e-nfs-server-stable.diff
  nfs07             891       finished  linux-2.6.17-g4bee93e-nfs-server-stable.diff
  nfs08             LOCK      unknown   unknown
  nfs09             LOCK      unknown   unknown
  nfs10             949       finished  patch-2.6.17-git1.bz2
  nfs11             897       finished  cairo-1.2.0.tar.gz
  nfs12             929       running   linux-2.6.17-rc5-g1a2098e-server-cluster-locking-api.diff
  nfs13             929       running   linux-2.6.17-rc5-g1a2098e-server-cluster-locking-api.diff
  ppc01             879       finished  linux-2.6.17-gdbd8524-nfs-server-stable.diff

The SUT will finish up whatever step it was working on for its last testrun, and then become idle.

Don't forget to unlock the SUT when you're done with it:

  $ sudo /testing/usr/bin/sut unlock nfs04

Remotely Power Cycling a SUT

Invariably, a test will put a machine into a bad state and lock it up. You can power cycle a machine using the sut script. For example, to power cycle 'nfs04', you'd do:

  $ sudo /testing/usr/bin/sut power nfs04

You can review the power status like this:

  $ sudo /testing/usr/bin/sut power nfs04 status

[Note that some systems may not have remote power control set up, either because its owner doesn't want to allow automated reboots, or because of a lack of hardware or software support for power management. In any case, you can see if/how the SUT does power management by looking for a script /testing/suts//bin/power. If there is no such script there, the machine has not been set up for power management within crucible.]

Logging into a SUT

To log into a SUT, first log into the Driver, then from there you can ssh into the SUT directly, as root:

  $ ssh my_name@crucible.osdl.org
  $ ssh root@nfs04
  #

The root password for SUTs is simply 'password'.

If you want to kill off any lingering test processes (in case they'll interfere with what you'll be working on), you can determine the processes via:

  # ps aux | grep RUNNING

Rebuilding and Reinstalling the Kernel

Kernels are unpacked by crucible into /usr/src/linux-*, and installed into /boot/kernel-*.

It is good practice to create a separate copy of whatever kernel tree you'll be hacking on, to avoid confusion later:

  # cp -r /usr/src/linux-2.6.17-rc6 /usr/src/linux-2.6.17-rc6-bryce-1

It's also good practice to set the EXTRAVERSION so there won't be issues with conflicting module paths.

  $ vi /usr/src/linux-2.6.17-rc6-bryce-1/Makefile

  SUBLEVEL = 17
  EXTRAVERSION =-rc6-bryce
  NAME=Crazed Snow-Weasel

If you like, you can manually compile and install the kernel using the normal linux kernel processes, and update the bootloader configuration files to suit. Just be careful not to change the default boot option! Otherwise, if there is a failed kernel, you won't be able to recover remotely.

If you don't want to do things quite so manually, Crucible's kernel management commands are at your disposal. This can be especially useful if you suspect the issue may be related to how Crucible is building the kernel. In any case, here's an example of how to use them:

  # build_kernel  [kernel-label] [config-default] [kernel-args]

  # build_kernel linux-2.6.17-rc6-bryce-1 \
                 2.6.17-r6b1 \
                 /testing/packages/linux/config.default \
                 mem=512M

The arguments to build_kernel are 1) the directory name you gave above, 2) some short tag to use as the bootloader entry's title, 3) the config file to use, and 4) any kernel arguments you wish to use.

You can determine which config file that a given testrun used by looking at the log file for that machine; it is printed within the first line or two after "### RUNNING '010-build_kernel' ###". linux/config.default is generally used for x86 systems, but if the test needed special config settings or ran on non-x86 hardware, a different config may be used.

Of course, you can also specify your own config file. ;-)

build_kernel will configure, make, install the kernel and its modules, update the bootloader, and create initrd if appropriate.

Booting the SUT to a Given Kernel

Once you've built and installed a kernel, you can boot to it using the boottool command:

  # /testing/usr/bin/boottool --boot-once --title 2.6.17-r6b1
  # reboot && logout

This is analogous to doing 'lilo -R 2.6.17-r6b1' (in fact, on systems using lilo, that's exactly what it does).

Note that some bootloaders (e.g. elilo & yaboot) don't yet have a boot-once capability, so to boot them you'll have to set the default kernel to your test kernel, reboot, and pray. Contact the administrator if a kernel fails on it.

Watching a SUT's Console

From the Driver, you can view a SUT's console via 'console':

  $ console nfs04

Helpful commands:

  ^E c ?     - help
  ^E c p     - replay log
  ^E c u     - host status
  ^E c .     - disconnect

The consoles are also logged to /var/consoles/ on the Driver.

Starting Test-Specific Services

Some tests start up services on bootup independently of the OS's regular init system. If you manually boot the system, you'll probably also need to start these up manually as well. For instance, NFSv4 test runs will require:

  # /testing/usr/bin/init_nfsv4_svcs

You can look at `testrun info $run_id` to see what post-boot actions are performed. These steps correspond to scripts you can usually find in the /testing/runs/$run_id/FINISHED/$sut_id/ directory.