... gives some hints on using DMF - SGI's hierarchical storage management system

Introduction

DMF is an abbreviation for Data Migration Facility, which is the name for SGI's hierarchical storage management software.

Hierarchical storage management is a general term for a system that manages data storage on a tiered storage system; e.g. one with one or more "tiers" of disk and/or SSD storage in front of an off-line tier such as a tape storage with a tape "robot".

HSM is is often used to implement archival storage, off-line of off-site replication, or simply management of large amounts of data that cannot all be kept online.

The Basics

SGI's DMF is typically implemented as a front end file system (on disk) that acts as a cache for a much larger pool of data on tape.  The end-user sees they whole storage pool as a UNIX / Linux file system.  All the files and directories appear to be present (e.g. to the "ls" command) but if an application tries to access a file that has been migrated to tape, there is a pause while the tape is loaded, and the file is read back into the cache.

Naturally, if an application attempts to access a lot of files in a naive fashion, it can generate a large number of tape load events. To help ameliorate this, DMF includes some end-user commands that provide the user with some control over data migration.

Commands

The following end-user DMF commands are available (copied from SGI DMF documentation):

dmarchive(1)

Directly copies data between DMF secondary storage and a POSIX filesystem that is not managed by DMF software, such as Lustre. It is intended to streamline a work flow in which users work in an archive filesystem and later want to archive a copy of their data via DMF software. For more information about the MIN_ARCHIVE_SIZE parameter, see “filesystem Object Parameters” in Chapter 7.

dmattr(1)

Displays whether files are migrated or not by returning a specified set of DMF attributes (for use in shell scripts).

dmcapacity(1)

Displays an estimate of the remaining storage capacity for each VG in each LS. You can optionally choose to report the data formatted into XML or HTML.

dmcopy(1)

Copies all or part of the data from a migrated file to an online file.

dmdu(1)

Displays the number of blocks contained in specified files and directories on a managed filesystem.

dmfind(1)

Displays whether files are migrated or not by searching through files in a directory hierarchy.

dmget(1)

Recalls the specified files.

dmls(1)

Displays whether files are migrated or not by listing the contents of a directory.

dmoper(1)

Displays outstanding requests for operator intervention.

dmput(1)

Migrates the specified files.

dmtag(1)

Allows a site-assigned 32-bit integer to be associated with a specific file (which can be tested in the when clause of particular configuration parameters and in site-defined policies).

dmversion(1)

Displays the version number of the currently installed DMF software.

Each of these commands has a manual entry, and these should be installed along with the commands.  For advice on how to obtain the  installers, how to install them and how to configure them for your DMF installation, please contact the operators. 

DMF file states

Each file or directory in a DMF file system has an associated DMF state.  The states are summarized as follows:

StateDescription
REG Unmanaged. The file only exists on disk.
MIG Migrating.  The file is currently being migrated to tape.
ARC Archiving.  The file is currently being archived to tape.  (This only applies when DMF is being used as an archiving mechanism on an "unmanaged file system.)
DUL Dual-state. An up-to-date copy of the file exists both on file and on tape.
OFL Offline. The only copy / copies of the file are on tape.
UNM Unmigrating. DMF is in the process of migrating the file back from tape.
NMG Nonmigratable file. Your DMF system's file management policy does not allow this file to be migrated. (For example, some installations will only migrate files bigger than a certain size.)
PAR Partial-state file. The file is partly on disc and partly on tape.  (This only applies with some DMF is configurations.)
N/A DMF cannot determine the file’s state. This may be because the file is not in a DMF-managed filesystem or because an error occurred when attempting to retrieve the file’s state.
INV The file is in an invalid state. That is, the file’s stat block contains fields that are inconsistent with each other and so do not represent any valid file.

The common state transitions for a file are:

StateActionNew State
(none) File is created REG
REG File migration starts MIG
MIG File migration completes DUL
DUL Disk space is reclaimed OFL
OFL File retrieval (unmigration) starts UNM
UNM File unmigration completes DUL

Note that these state transitions (apart from the first one) are typically instigated by the DMF system according to the installation's configured policy, though it is also possible to force the various transitions using the "dmput" and "dmget" commands.

Primary reference

HOW TOs

The following assume that "/data/Qxxxx" is a directory free on a DMF-managed file system.

Listing files and checking file attributes

On a DMF-managed file system, the standard "ls" command will only show files that are currently on disc.  To list all files, you can use the "dmls" command.  The "dmls" command takes command line options that are largely the same as the standard "ls" command.  (Some versions of "dmls" have an additional "-M" option for just showing the DMF states of files.)

Example:

$ dmls -l /data/Qxxxx
total 608
-rw-rw-r--     1 user    group    124362 2015-04-01 23:47 (DUL) 1
-rw-rw-r--     1 user    group     62299 2015-04-01 23:47 (DUL) 2
-rw-rw-r--     1 user    group     62257 2015-04-01 23:47 (DUL) 3
-rw-r--r--     1 user    group    245582 2015-03-17 15:17 (DUL) CHANGELOG
drwxr-xr-x  2965 user    group     73728 2015-03-31 09:24 (REG) data
drwxrwxr-x     2 user    apache      26 2014-10-22 10:20 (REG) pub

(There are incidental differences between the output formatting for "ls" and "dmls" which may be relevant if you are using "dmls" in a script.)

A second utility for examining a DMF managed file is the "dmattr" command. This allows you to list specific DMF attributes using the "-a" option, or all of them using the "-l" option.

$ dmattr -l /data/Qxxxx/cheese
      bfid : 547c1c4a000000001080e351
     emask : 160000
   fhandle : 0100000000000018a5f6616c8a5548d500000000084740582515bc1b00000000
     flags : 0
     nregn : 1
     owner : 12345
      path : cheese
    projid : 1234
   sitetag : 0
      size : 124362
     space : 126976
     state : DUL

Please refer to the "dmattr" manual entry on what the attributes mean, and on other command options.

Finding out how much file space you are using

If you need to find out how much space your files are occupying, then the standard "du" command will only show you how much disk space is being used for the online copy. To get the size of offline (state OFL) files, you can use "dmdu".

This example illustrates this for a subtree where all of the large files have been migrated and are currently in OFL state.

$ du /data/Qxxxx/data/HG01187/alignment/
52	    /data/Qxxxx/data/HG01187/alignment/
$ dmdu /data/Qxxxx/data/HG01187/alignment/
43395850    /data/Qxxxx/data/HG01187/alignment/

Please note that "dmdu" is significantly slower than "du", since it needs to query the DMF services to retrieve file attributes for each file in the directory tree that it is traversing.

Finding files

Experienced UNIX and Linux users will probably be familiar with the "find" command.  This is a "swiss army knife" utility for traversing directory trees and locating files based on their metadata.

DMF provides a version of the "find" command called "dmfind".  This works the same way as regular "find", but it also allows you to include various DMF attributes in the query.  For example, the "-state <state>" option allows you to match objects with a given DMF state.  For example, the following would search for all "tar" files that are in OFL state.

$ dmfind /data/Qxxxx -name \*.tar -state OFL

(There are numerous tutorials and examples on the internet that illustrate how "find" can be used.  These can be generalized to "dmfind".)

Pushing files to tape

A typical DMF installation consists of a tens or hundreds of Tbs of tier 1 disk storage, backed by a much larger amount of tape storage. If the system is shared by multiple users, it quite possible for a "data hungry" use to monopolize the tier 1 disk space by accessing many files in a relatively short period of time. It is common for the administrators to implement disk quotas to prevent this, but this can leave users at the mercy of the DMF system's decision making algorithms to "flush" files that no longer need to be online.

One way to deal with this is to use the "dmput" command force files to be written to tape and push them out of the tier 1 cache.  The "dmput" command is really quite simple:

  • The "-r" option tells DMF to remove files after they have been written to tape.  If not supplied, the decision to remove files is left to the DMF algorithms.
  • The "-w" option tells DMF to wait until all files have been written to tape before returning.  If not supplied, "dmput" returns immediately and the tape writes (and subsequent file removal) happen asynchronously.
  • The command takes as arguments the list of file pathnames to be processed.  If no arguments are provided, "dmput" command reads pathnames from standard input.

Examples:

This example writes files "report1" and "report2" to tape and then removes them.

$ dmput -r -w report1 report2

This example schedules all new or updated files to be written to tape ... but not removed.

$ dmfind /data/Qxxxx -state REG -type f | dmput

Prefetching files from tape

As mentioned previously, if an application naively attempts to read a sequence of files from a DMF-managed file system, and those files have been migrated to tape, then it is likely generate a stream of individual requests to retrieve files.  The application is liable to perform badly, and the DMF load is liable to interfere with other users.

Instead of doing this, it is typically a better idea to retrieve all of the files that the application is going to read ahead of time.  Specifically, if DMF knows ahead of time that it has a number of files to retrieve, it can schedule the fetches to minimize tape loads and tape seeks.

The simple way to request that a file is retrieved is to use the "dmget".  The command is even simpler than "dmput".  There is no "-w" option because "dmget" always waits, and there is no "-r" option.  Instead, you simply provide a list of file pathnames on the command line or on standard input.

Examples:

This example fetches files "report1" and "report2" from tape.

$ dmget report1 report2

This example retrieves from tape all offline files in a given tree.

$ dmfind /data/Qxxxx/data/HG01187/alignment/ -state OFL | dmget

Batching files

Let us suppose that I wanted to process all of the files in the "/data/Qxxxx" tree.

  • If I run an application over all of the files without prefetching them, the application will perform badly, as described above.
  • If I prefetch all of the files, I could run into a different class of problems:
    • If the disk space required to hold the files is large, my prefetching is liable to interfere with other users by pushing their files offline.
    • If the disk space required exceeds my disk space quota, the prefetch is liable to fill my quota and then fail.
    • If the disk space required is more than is available, DMF is liable to push out some of my files that it has just fetched ... before I can run the application on them.

The answer to this conundrum is process the files in batches.  You need to divide the set of files to be processed into batches containing a manageable number and size of files. Then you set up a script to do the following:

  1. Use "dmget" to selectively prefetch the files in a batch.
  2. Process the prefetched files.
  3. Use "dmput -r -w" to force them offline again.
  4. Repeat ...

If your files are naturally organized into a number of subdirectories, then you could use the subtrees as your batch boundaries.

Using rsync on a DMF file system

 The "rsync" command is a useful utility for moving larg numbers of data files from one place to another.  If you "rsync" for copying entire trees to or from a DMF-managed file system, then the previous advice applies in the obvious:

  • When copying files from the DMF file system, prefetch using "dmget" so that "rsync" won't stall which waiting for the files to be retrieved.
  • When copying to the DMF file system, force out files using "dmput" if they don't need to be online.
  • When copying large numbers of files, use batching to avoid cache and quota issues.

When you are using "rsync" to update an existing copy, then there are some additional things that you need to do to make the transfer as smooth as possible.

  • If the rsync program encounters a file that already exists at the destination, it needs to figure out if the file needs to be recopied. The program can do this in two ways; by comparing file sizes and timestamps, or by comparing file checksums. In the DMF case, calculating the checksum of an OFL file entails retrieving the file from tape.
  • When the rsync program decides that it does need to recopy a file, it will try to optimize the transfer using a delta-transfer mechanism. In the DMF case, the old copy of the file (i.e. the one that is about to be overwritten) needs to be retrieved from tape to calculate the deltas.

In short, when using "rsync" to update a file tree, when the source or destination is on DMF, you should avoid the "--checksum" ("-c") option, and you should use the "--whole-file" ("-W") option to disable the delta-transfer mechanism.

Small files

Handling lots of small files is problematic for DMF.  (Actually, it is problematic in other situations too, but the problem is particularly bad for DMF.)  The root of the problem is that small files consume a lot of DMF resources, relative to the number of bytes of data that they hold.

  • Data is stored in multiples of "disk blocks", and a small file that is less than one disk block in size still occupies a whole disk block.
  • Each file requires an inode to hold the file metadata, and inode storage can be at a premium.
  • Files that are managed by DMF

To mitigate this, DMF installations are often configured with a strong bias against migrating small files to tape.  (Indeed, in some situations, DMF can be configured completely refuse to do this!) This means that if you have a lot of small files, they are liable to be "resource hungry" and liable to consume more of your quota'd resources; e.g. online disk space and inodes.

My advice would be to avoid organizing your data as lots of small files.  If possible change your application to use larger files; e.g. in the form of an archive (e.g. tar, zip, etc) or a flat-file database (e.g. BerkelyDb, SQLite, etc). If that is not possible, then considering staging the files; i.e. storing them on the DMF-managed file system as an archive, and then unpacking the archive onto your local file system prior to use.