The Problem

Suppose that someone on your organization has created a NeCTAR instance and set up some important system on it ... and then left without doing a proper hand-over of the SSH keys that allow access. (Or maybe, you have lost the keys due to some disaster with your workstation, which you hadn't backed them up ...)

How do you get back into the Instance?

Gaining external control of the Instance

 In order to get into an instance whose SSH keys have been lost, you first need (at least) "member" access to the NeCTAR project (tenant) that the instance belongs to. Your options are as follows:

  • Ask the current project owner to make you a member.  This can be done through the Dashboard.
  • Ask NeCTAR RC support (or your local Node support) to make you a member of the project, or to transfer ownership of the project to you. They will need to ask for a justification and for proper authorization before doing this, so be prepared to provide them with what they need.

Getting Into the Instance

Once you have external control (i.e. access rights at the NeCTAR Dashboard level), there are three ways that could potentially get you into the Instance.

Logging in via the console

If you know what the instance's "root" password is (or you can guess it), then you can access the Instance's console via the Dashboard, and login as "root".  Once you are in, you can add extra SSH keys to the admin account's "authorized_keys" file to restore normal SSH access.

Unfortunately, if you don't know the root password, this is a bit of a long shot:

  • There is a good chance that "root" access using a password is disabled entirely.  This is the initial state for most NeCTAR images.
  • If the previous owner set a "root" password, there is a good chance that you won't be able to guess it.  (Don't believe those silly Hollywood movies ...) 

Personally, I wouldn't spend much time on "password guessing" because the chances of success are (IMO) relatively small.

Snapshot and relaunch with new SSH keys

The next thing to try would be to take a snaphot of the Instance and then launch a new instance supplying with different SSH keypair.  What should happen is that the "cloud-init" utility should notice that this is a new instance, and then add the supplied SSH key to the "authorized_keys" file for the admin account.

However, this doesn't always work.  For example:

  • It only works if Instance has 'cloud-init' installed and properly configured.  This should be the case for VM images that are derived from the NeCTAR base images, but images that started life outside of NeCTAR may not have coud-init installed.
  • I've come across cases where the relaunched instance ends up with dead networking.

More importantly, even if you can get into the relaunched instance, this is not the same as the original one. Specifically, the state of the original instance's ephemeral filesystem is NOT included in the snapshot.  This means that you won't be able to see what is in the original "/mnt" directory.

Using "nova rescue"

The third possibility is using the "nova rescue" subcommand to boot up a rescue instance, as described here.

This approach should allow you to repair an instance without destroying its ephemeral file system.  However, there is a catch.  When the rescue instance is created, it appears that OpenStack uses the SSH keypair that is currently associated with the image you are trying to rescue. If you lost the private key for the original, you won't be able to SSH into the rescue instance either.

There are two possibilities to try:

  • You may be able to login using the admin account and the temporary password.  (That won't over SSH, because NeCTAR images have "sshd" configured to bar all SSH login with a password.)
  • You may be able to "pull a swifty" by deleting the keypair and uploading a new keypair with the same name. (It appears that Nova identifies keypairs by name rather than by id. If this works, the rescue instance will use the new public key, and you will be able to login ... and repair the orphaned instance.)

Note: as above, this only works for VM instances that have cloud-init installed and configured.  (It is used to plant the SSH key in the rescue instance.)

Snapshot, repair snapshot image, and rebuild the instance

This approach is rather convoluted, but it should work in all cases.

This requires a system (desktop, laptop, whatever) with the Glance client installed, and an OpenStack password that you can use to access the project's images in Glance; see Using Openstack clients for NeCTAR.

It also requires tools for mounting a COW2 image as a file system on your desktop / laptop.  Assuming that you are using Linux, I recommend that you install and use "qemu-nbd" for this. On RHEL-based and Fedora based systems, use yum to install the "qemu-img" package.

Step #1: snapshot the NeCTAR instance

Step #2: download the snapshot from Glance:

 

$ . <projectname>-openshrc.sh
Enter project password: ...
$ glance image-download --file /path/to/image <IMAGE>

where <IMAGE> is the name or ID of the snapshot image.

Step #3: mount the image as a file system.  (I'm assuming that you are using qemu-nbd for this.)

$ sudo mkdir /mnt/image
$ sudo modprobe nbd max_part=63
$ sudo qemu-nbd -c /dev/nbd0 /path/to/image
$ sudo mount /dev/nbd0p1 /mnt/image

Step #4: Go into the file system mounted on "/mnt/image", find the account that should be used for login and the appropriate SSH public key(s) to its "~/.ssh/authorized_keys" file. (You could also attempt to add a root password or install the cloud-init package, but it is probably simpler and safer to do those things later.)

Step #5: Unmount the image.  This step is critical.

$ sudo umount /mnt/image

Step #6: Upload the modified image to Glance.

$ glance image-update --file /path/to/image <IMAGE>

Note that this overwrites the existing image.  If you want to create a new one, you can use image-create instead.

Step #7: Launch a new instance from the image, and check that you can SSH into the instance now.

Step #8: (Optional) Use "nova rebuild" to re-image your existing instance with the modified image.  Use "--preserve-ephemeral".  Beware: I haven't tried this out, and in the worst case it could go horribly wrong and destroy your instance.

Are there any other options?

It looks like "nova rebuild" might be able to use the "--file <dst-path=src-path>" option to replace a "~/.ssh/authorized_keys" file.  I have not tried this.

In theory, it might also be possible for the Node Operations staff to shut down the instance and then access the disc image "behind the back" of OpenStack.  However, this is likely to be a dangerous thing to do ... and my local Ops guys have said that they really, really don't want to do this kind of thing.