$Id: layers.html,v 0.6 2003/10/21 04:15:01 stevegt Exp $

Layers of Infrastructure: The Admin Stack

Steve Traugott, TerraLuna, LLC -- stevegt@terraluna.org

Abstract

The OSI reference model for network architecture provides a useful tool for discussion and targeting of network protocol designs. A similar layered reference model for systems architecture and administration is apparent, but seems to lack a formal published description. Those attempts at description which do exist ("client/server", "3-tier", "middleware") tend to be application-centric and do not address the needs of systems administrators and infrastructure architects.

This lack of useful prior art contributes to poorly stated requirements, omission of critical planning elements, and duplication of corrective measures. These factors, coupled with associated misunderstandings in conversation, presentation, and publication, limit the effectiveness of current infrastructure architecture efforts.

This paper attempts to describe one possible layered reference model for infrastructure architecture.

Overview

Like the OSI reference model for networks [osi], there is a "layer" or "stack" model available for systems infrastructures. If recognized during discussion and design, this model can greatly simplify and clarify the development, installation, configuration, and administration process for infrastructures.

It's possible that the need for publication of this model hasn't been recognized before now due to the relatively ad-hoc nature of traditional systems administration. But the growing field of infrastructure architecture and automated systems administration is greatly in need of this rigor [infrastructures].

An awareness of these layers can improve design of automated administration code. For example, operating system configuration and database configuration are different problem domains. Trying to implement an automated tool to do both may not be a good fit. By way of analogy, in the OSI model this might be equivalent to implementing layers 2 (link layer) and 3 (network layer) with one protocol, merging the functionality of Ethernet and IP numbers into one address space and producing an unsuccessful design.

Systems administrators use portions of this layering in everyday planning already -- for example, we would normally ensure that "Configure DNS" and "Install Operating System" show up in the right order on a project plan.

We might not be as careful about the ordering of "Implement O/S Management Infrastructure" and "Install Application". But experience shows that attempting to test or deploy an application throughout an enterprise without first gaining positive control of the underlying operating systems is like building on shifting sand.

This ordering is becoming more critical for other reasons. As interest in distributed and grid computing grows, many organizations are already well into expensive rollouts of distributed computing frameworks [ipg][globus][condor]. But most of these initiatives are purely application-level in design, planning, and funding, with little or no effort allocated to the need to manage underlying kernel versions, shared libraries, or other prerequisite components. As a result, users of these frameworks quite literally see mixed results, even when using "platform neutral" languages such as Java [javagenes].

The Stack

A workable layer structure for infrastructure architecture seems to consist of 5 layers: hardware, firmware, system, service, and application (see Fig 1).

These layer names should be relatively clear, with the possible exception of the operating system layer -- here we drop the word "operating" for conciseness.

A common feature of each layer is that it depends upon proper operation of the layer beneath it. The hardware layer, at the bottom, ultimately handles all I/O.

The application and service layers are partially co-located at the same level. Even though a client application may depend on the proper operation of a remotely located service daemon, for example, it also depends on the local operating system.

A common objection to this layer structure is that "there's no network layer". Segregating network components into their own layer is traditional but misleading, and implies a distinction which does not exist. Network devices such as switches and routers are normally only an aggregate of (at minimum) the hardware and firmware layers shown here. More complex network devices, including high-end routers and those devices made from general-purpose hosts, tend to also incorporate system, application, and service layers.

[paragraph about how dependency of hosts on network fits into this model]

The OSI "physical" layer 1 is a subset of the infrastructure hardware layer; passive networking components such as cable and fiber are hardware layer components.


Fig 1: The Infrastructure Stack [ to be replaced with a real drawing ]

-------------------------
| Application __________|
|             | Service |
-------------------------
| System                |
-------------------------
| Firmware              |
-------------------------
| Hardware              | <----> [ icon showing external infra/user/world]
-------------------------

Hardware Layer

Components of this layer cannot be altered in the field in any way using software control. This layer depends on proper operation of external infrastructures such as the power grid. All higher layers depend on this layer.

This layer includes mechanical switches, hard wired circuitry, wires, cables, non--programmable discrete components, disk drive controller circuitry, physical memory, and other physical infrastructure which is either non-programmable or provides volatile storage.

User interaction with this layer is limited to physical manipulation of keyboard buttons, power switch, and audio and video I/O. However, all interaction between the user and all higher layers is via this layer.

Administrator interaction with this layer is more comprehensive, and includes physical replacement of components. By definition, replacement or upgrade of this layer always includes physical labor proportional to each unit managed.

Distinctions between local and external hardware infrastructure are critical when determining administrative and disaster recovery procedures. For instance, locally-administered power sources, such as generators and uninterruptible power supplies, are components of this layer. Power sources not under local control, such as public utility power, are not. Locally-administered cooling, such as that provided by a dedicated HVAC unit, is a component of this layer. Cooling not under local control, such as campus-wide chillers, are not.

Firmware Layer

This layer consists of field-programmable hardware components which each serve a dedicated purpose. This dedicated purpose is chosen by the vendor of the hardware layer. This layers depends on proper operation of the hardware layer. All higher layers depends on proper operation of this layer.

This layer includes field-replaceable microcode, boot configuration parameters stored in CMOS, and field-replaceable boot loader code not stored on disk.

User interaction with, and awareness of, this layer is almost non-existent, and is usually limited to the audible and visible output of power on self tests.

Administrator interaction with this layer usually consists of configuration of boot parameters and infrequent upgrade of firmware code.

Administration of this layer can be automated to some degree, reducing labor costs. How much automation is possible depends on the API and license policies of the firmware vendor. Executable code and data for this layer are stored in non-volatile memory. Address space within this layer is allocated by the hardware vendor, and the starting address of allocated blocks is generally hardwired into the hardware layer. The API for accessing this layer is often not published by the vendor, and the software for accessing this layer is generally vendor supplied. In some cases, the storage space for the access software itself is non-programmable and is included in the hardware layer.

System Layer

This layer consists of the software required to control and communicate with the firmware layer, in order to provide a common set of standard, publicly documented system calls for higher layers to build upon. This layer depends on proper operation of the firmware layer. All higher layers depend on proper operation of this layer.

This layer typically includes the disk resident boot loader code, kernel, device drivers, and configuration files for all of the above. These components are generally published together as part of an operating system distribution.

User interaction with this layer via lower layers is extensive. User awareness of the boundaries of this layer, however, is normally limited to knowing the name of the operating system itself. User perception tends to blend the operating system layer into the higher layers which it supports. This situation creates confusion which is extremely damaging in terms of management and funding of administration efforts for all layers.

Administrator interaction with this layer is significant, but not dominant. Our estimate is that, in most environments, this layer consumes only ten percent of administrator time. This interaction includes, but is not limited to, installation, configuration, and upgrade of the components listed above.

Administration of this layer is capable of being highly automated, greatly reducing labor costs. This automation is rarely performed, due to low user appreciation for the need. The degree of automation possible is in direct proportion to the percentage of system calls which are publicly documented. UNIX and derivative operating systems enjoy 100 percent publication of the system call interface, while the limited publication of Windows system calls prohibits full automated management of the Windows operating system.

Service Layer

This layer consists of the backend and central services which provide computation and data storage for the application layer. All communications between this layer and any other layer or the outside world are via the operating system layer's system call interface. This layer depends on proper operation of the system layer. All higher layers depend on proper operation of this layer.

In a client-server environment, this layer is the "server". In a 3-tier environment, this layer is the "middleware".

This layer includes, but is not limited to, database, mail, web, directory, compute, and file servers. Web "application servers", while badly named, also belong in this layer. Take care to note that login servers, such as the telnet and ssh demons, also fall into this layer. All members of this layer are characterized by the fact that users use a client application of some sort, even if it is only "telnet", to communicate with members of this layer.

Direct user interaction with this layer is virtually nonexistent -- users interact with client software instead of service software. However, user awareness of this layer is extremely high, particularly in thin-client environments such as the web.

The extremely high user awareness of this layer, coupled with the extremely low user awareness of the operating system and firmware layers, tends to create a perception that the "server" includes the operating system, firmware, and hardware. This perception is so widespread that it has long permeated the administrative community as well.

This careless use of terminology creates confusion -- when we say "the database server is down", do we mean the mySQL daemon, or do we mean the operating system or hardware device on which it runs? The damage this confusion creates propagates all the way up the chain of command and has direct impact on IT funding in most organizations.

Administrator interaction with this layer is high, in our estimate consuming around one-half to two-thirds of administrator time. The actions performed are similar to those performed on the operating system layer, and in many cases the two categories of activity are lumped together in planning. This is risky -- administration of the service layer cannot be properly performed until the operating system is correctly configured, and management systems are in place. Project timelines often fail to recognize this, resulting in schedule and cost overruns.

Administration of this layer is capable of being highly automated, greatly reducing labor costs. This automation has historically been seldom performed; automation of this layer is difficult without first automating management of the operating system layer. Trying to automate the service layer management without first automating the operating system management is like building on shifting say and -- prerequisites can be properly specified.

The degree of automation possible is in direct proportion to the degree of automation present in the operating system layer. Automation of the service layer is also highly influenced by the percentage of service configuration which can be performed via vendor-neutral code. A recent trend in Java based services, for example, includes an increase in GUI installation tools and binary configuration files. To be automated, the installation and configuration process for these services must be reverse-engineered.

Application Layer

This layer consists of the software components which users predominantly interact with. All communications between this layer and the outside world take place via system calls at the operating system layer, even when the application is "talking" to a service. In a client-server environment, this layer is the "client". This layer depends on proper operation of both the system and service layers.

This layer includes, but is not limited to, office productivity tools, compilers, mail readers, web browsers, shell command interpreters, filesystem and file manipulation programs, network control utilities, configuration files used by all of the above, and virtually all shared libraries.

This layer includes a great many tools and utilities normally provided as part of the operating system distribution, but which are not part of the kernel, device drivers, or other elements defined above as part of the operating system layer. In conversation, in vendor publications, and throughout the entire computer industry, application components provided as part of an operating system distribution are generally lumped in with "the operating system". A well-known recent example of this is the lumping together of the GNU applications and tools with the Linux kernel and calling the collection "Linux".

This contextual mixing of components from dissimilar layers causes great confusion. This confusion has global economic impact, and regularly results in ill-informed business plans, disrupted court cases, and poorly-defined laws. In many cases, this confusion is intentional, in order to manipulate public perception. The case of Microsoft insisting that their Web browser and their operating system are inseparable is a recent example.

User interaction with the application layer (via lower layers) is dominant. This is true to such a degree that users tend to discount the presence or boundaries of the lower-level layers. This misconception is extremely damaging in terms of organizational planning, timelines, funding, and expectations.

Administrator interaction with this layer is high, but primarily consists of manipulation of the service and operating system layers using a small subset of the application layer. This subset includes, for example, file manipulation utilities, network configuration and monitoring tools, and dedicated, vendor-supplied applications which exist only to manipulate components of the operating system or firmware layer. On UNIX derivatives, these include the shutdown, format, and ifconfig commands. In Windows, these include the applications accessible via the control panel icons.

Administration of the application layer is capable of being highly automated. Automation of this layer is problematic without first automating the operating system and service layers, due to the need to manage application prerequisites and protocol versions.

Like the service layer, automated management of applications is dependent on how well application installation and configuration can be performed using vendor-neutral code. For example, fully automated management of the applications installed on a Windows machine is expensive, due to the non-scriptable GUI installation and configuration process inherent to these applications. This problem was recognized by the trade press very early during the adoption of enterprise use of Windows [TODO show references], and has yet to be fully remedied.

Automated management of UNIX applications tends to be very straightforward, due to the vendor-neutral, open configuration file formats and non-GUI installation procedures these application vendors tend to offer. This allows great reduction of administrative expenses for organizations using this operating system.

A recent disturbing trend, however, can be seen in some UNIX application vendors' product offerings. Some vendors are beginning to offer GUI-only installs, in an attempt to "be like Windows". Their efforts prevent widespread enterprise management of their application. We believe these vendors may initially appeal to IT managers who prefer to use Windows on their desktop, while the lack of scalability and repeatability of this installation method will limit the reliability of these applications.

Acknowledgments

Basil Hashem described a slightly different "stack" in a meeting with Alain Mayer and myself. In that variant, there was no firmware layer, and instead of a generic service layer there was a dedicated database layer between application and system. Alain and Basil are both with CenterRun, Inc. [who else was at this meeting? Jeff Kroll]

References

[condor] The Condor Project for High-Throughput Computing, http://www.cs.wisc.edu/condor/

[globus] The Globus Project for Computational Grids, http://www.globus.org/

[infrastructures] Infrastructures.Org, Automated Systems Administration and Infrastructure Architecture, http://www.infrastructures.org/

[ipg] Information Power Grid, NASA's grid computing effort, http://www.ipg.nasa.gov/

[javagenes] JavaGenes and Condor Cycle-Scavenging Genetic Algorithms (NASA Technical report NAS-00-006), Al Globus, Eric Langhirt, Miron Livny, Ravishankar Ramamurthy, Marvin Solomon, Steve Traugott, Java Grande 2000, sponsored by ACM SIGPLAN, San Francisco, CA, June 3-4, 2000, http://www.nas.nasa.gov/Research/Reports/Techreports/2000/nas-00-006.html

[osi] OSI Reference Model The ISO Model of Architecture for Open Systems Interconnection, Zimmermann, H., IEEE Transactions on Communications, 1980