An Introduction to Early UNIX Volume and File Structure

Mark Kampe

1. Introduction

By the time Ken Thompson and Dennis Richie wrote UNIX in the 1970s, file systems with fixed-sized blocks and tree-structured directories were already common. A key goal for UNIX was to provide the power and generality of larger timesharing systems (most notably Multics) with a much smaller and simpler implementation. Their original UNIX file system design provided very reasonable functionality with relatively simple mechanisms. Their file system implementatin, like so many other parts of UNIX, emphasized simplicity over functionality. UNIX file systems worked pretty well, and for many years the only changes made to them were to move to larger block sizes (to provide better throughput).

Over the next decade, many users bemoaned some of the functional and performance limitations of the basic UNIX file system. A team of researchers at UC Berkeley decided to address these issues in their (now legendary) Fourth Software Distribution (4BSD). The functionality (long file names and symbolic links) and performance (cylinder clusters and bit-map free lists) improvements were quickly adopted by all of the major UNIX vendors, and the original UNIX file system design went the way of the dinosaurs.

This is description is probably only of interest to people who want to understand the problems that the BSD fast file system enhancements were responding to.

2. Structural Overview

All file systems include a few basic types of data structures:

*** FIX ***

3. Volume Descriptor (Super Block)

5. File Descriptors (I-nodes)

4. Free Space Descriptors (Free Lists)

6. File Naming (directories)

6.1 Symbolic Links

7. Summary

In MVS volumes, each of these data structures is stored in a Data Set Control Block (or more commonly a DSCB). Each DSCB is 140 bytes long, and they all reside in a single contiguous area called the Volume Table of Contents (or more commonly the VTOC). The starting address (cylinder, head, record) of the VTOC can be found at the beginning of the third record of the boot track. The size of the VTOC is described in its first DSCB.

Other than the first DSCB (which is always the volume descriptor), most of the remaining DSCBs in the VTOC can be used for any purpose (e.g. file descriptors and free space descriptors). All of the file system meta-data is confined to the DSCBs in the VTOC. Other than the boot track and the VTOC, the entire remainder of the volume is available to be allocated to user files.

Since all of the meta-data in an MVS volume resides in DSCBs, a study of MVS volume structure inevitably turns into a discussion of the various types of DSCBs. For our purposes there are four important types of DSCBs (there are a few others, but they have rather specialized uses). We can tell type a particular DSCB describes by looking at its format byte. The most interesting types are:

Each of these will be discussed in the following sections.

3. Volume Descriptors (Format 4 DSCB)

All file systems need a volume descriptor that describes the size and layout of the file system. MVS stores this information in a format 4 DSCB. There can only be one format 4 DSCB in a volume, and it must be the first DSCB in the VTOC.

The information included in the volume descriptor DSCB includes:

4. Free Space Descriptors (Format 5 DSCBs)

MVS files are made up of a few (e.g. not hundreds) of chunks of space. Each chunk is contiguous space, and can be read or written with little-or-no head motion. Successive chunks may or may not be close to one another. Each of these separately allocated "chunks" is called an extent. Different files can have different extent sizes. The size of each extent, as well as its alignment (e.g. on a track or cylinder boundary) is chosen by the programmer who creates the file.

If I had a database that I expected to occupy somewhere in the range of 5-25 cylinders, I might allocate space to it in single cylinder extents. When processing the entire database, I might then read a whole cylinder at a time (transferring several megabytes, at very high speed, with only a single seek at the start of the operation). This would enable me to process my database very efficiently.

Because different files can use different extent sizes, MVS volumes support variable partition allocation (with coalescing). They keep track of the (currently) free space with a free list comprised of format 5 DSCBs.

The second DSCB in the VTOC is a format 5 DSCB, which is the start of the free space list on that volume. A format 5 DSCB contains up to 26 free extent descriptions. A free extent description includes the address of the next free extent, and its size (in cylinders and tracks). A free extent can be of arbitrary size, but must be contiguous. If there are more than 26 free extents, there is also a pointer to the next format 5 DSCB in the free list.

When a Format 5 DSCB is empty (no longer points to any free extents) it is recycled by changing its format to 0 ... making it available to be reallocated for some other use. If it becomes necessary to add new nodes to the free list (because the appropriate format 5 DSCB is already full) a new (type 0) DSCB is found, turned into a type 5 DSCB, and inserted into the free list.

5. File Descriptors (Format 1 and Format 3 DSCBs)

Each file on an MVS volume is described by a type 1 DSCB. The contents of a type 1 DSCB includes:

MVS doesn't really have directories. Each file has a unique, true name, which is recorded in the file's type 1 DSCB. These names are all 44 characters long. To find a particular file, it is often necessary to read through all of the type 1 DSCBs in the VTOC until the desired file name is found.

A Format 1 DSCB includes only three extent pointers. If a file has more than three extents of space allocated to it, one or more Format 3 DSCBs will also be associated with the file (to describe the additional extents). Each Format 3 DSCB describes up to 13 additional extents (giving the extent number within the file, and its starting and ending addresses). There is also a pointer to the next Format 3 DSCB ... making it possible to allocate as many extents to a file as desired.

When a new file is created, an unallocated (type 0) DSCB is located, and turned into a type 1 DSCB. As extents are added to the file, they are recorded in the Format 1 DSCB. When the fourth extent is added, another unallocated DSCB is found, and initialized into a Format 3 DSCB to describe the next 13 extents.

When a file is deleted, all of its extents are returned to the free list (coalescing as necessary), and the Format 1 DSCB (and any associated Format 3 DSCBs) are cleared (making them available for reallocation).

6. Volume Index

As mentioned previously, MVS does not really have directories. Filenames may be structured in a way that suggests directories (e.g. USERS.KAMPE.CS111.WEB.DOCS.MVS.HTML) but this is only a naming convention.

DSCBs are allocated in a First-Come-First-Served fashion, so there is no guarantee that the Format 1 DSCBs for related files will be anywhere near one another. Since the order of DSCB allocations is unrelated to file names, finding a desired file by searching through all of the Format 1 DSCBs can be a very slow process. Such searches could be made much faster if we prepared a sorted list of file names.

MVS file systems often include an index or catalog to optimize file opens and other common searches. An MVS volume can be designated (in the volume descriptor) as having an index. If there is an index, it is always named "SYS1.VTOCIX". The index (which can be recreated at any time) contains:

The Volume Index Entry Records (or VIERs) make up the majority of the Index, and these are used to create a sorted list of file names. The VIERs exist in a hierarchy that is very much like directories:

There are, however, a few key differences between the Volume Index Entries in the catalog and true directories:

The Volume Index can be deleted at any time, with no harm to the volume. It can be reconstructed any time, by scanning the VTOC and enumerating all of the Format 1 and Format 5 DSCBs.

7. Summary

From the user's perspective, the big difference between MVS volumes and any other file system is the number of decisions that the user is required to make when creating a file. The user has to declare what the file's record structure will be, choose the most appropriate standard access method, estimate the eventual size of the file, and decide what units should be used for space allocation and disk I/O. Very few other operating systems require users to provide so much input into how the file should be created and used ... but the result (of good choices) is file I/O performance that very few other operating systems can match.

In memory allocation, the primary purpose of variable partition allocation was to minimize internal fragmentation. In the case of MVS volumes, the primary purpose is to give users more control over space allocation ... specifically to make it possible to allocate space in large contiguous extents that can be read and written very efficiently.

In memory allocation, we found that variable partition allocation almost inevitably led to external fragmentation. Over short periods, coalescing can counter this trend ... but ultimately external framgentation is likely to win. The same thing is true of MVS volumes. For this reason, MVS data administrators have to back-up and rebuild their file systems occasionally ... to reallocate space more efficiently, and coalesce all of the available free space.

Because the MVS volume structure is primitive (invented in the 1960s) it is simple. All file system meta-data is stored in a pool of fixed size DSCBs, and file names are directly stored in the file descriptors. If we look at the optional index/catalogs (which were added later to enhance performance) we see data structures that are actually quite similar to the bit-map free lists and hierarchical directories of other file systems. IBM found a way to obtain enhanced functionality and performance, while maintaining upwards compatibility with older on-disk data structures. This is a problem that all file system vendors face.

8. References

IBM MVS/DFP V3R3 System Programming Reference
Document SC26-4567-02. IBM Library Server, book IGG3S300