home | sitemap | abstract | introduction | chaos | thinking | checklist | migrating | recovery
pushpull | cost | career | workshop | isconf | list_and_community | papers | references

Chaos

Outside the systems administration field, it's not well known that computer systems management is generally ad-hoc. Administration tends to be reactive in nature, based on monitoring and helpdesk trouble calls. Little effort is allocated for proactively automating the management of systems in order to prevent problems and expedite solutions.

The economic and cultural impact of this situation is real -- in a global economy dependent on information technology, the friction generated by ineffective IT practices influences productivity, jobless rates, quality of life, and growth of emerging markets.

The business impact is real. Most direct IT expense is labor, not hardware. Organizations bleed labor hours in IT, and lose revenue to rollout delays and unscheduled service outages.

There is hope, and there is progress. The authors and members of this web site are current and former systems administrators, and are pioneers in the field of Infrastructure Architecture.

A well-run IT department is like air -- it's taken for granted. It smoothly provides services which are always reliable and never delivered late, performance of systems is superb, and mail and files are never lost. Its members are not well known throughout the company, because of the reduced demand for interaction. Organizational dynamics are such that, in most cases, a well-run IT department slowly loses influence and funding.

On the other hand, an average IT department is always visible, because its administrators are frequently at your desk, fixing something. You see them working hard, while you wait. When the corporate mail server dies, you know that they are working hard through the night to get it going again, trying to recover most of your mail. They get credit for this, and the next time they ask for more funding, you'll think of these heroic efforts.

This is a huge disincentive against doing things right in the first place. "Don't fix it until it breaks" is equivalent to not changing the oil in your car until the engine seizes up -- when a major server crashes, you lose money. And yet that's the way most IT organizations operate at the line level. Even their own CIOs aren't usually aware of this. When we tell them, the initial reaction is usually disbelief -- "things can't be that bad, not on my watch". The next reaction is always "well, I'm sure my people know what they're doing, and have the problem well in hand". The reality at the line-level in these organizations is always very different.

The line-level systems administration community has become so resigned to this reality that, over the course of decades, a global subculture has grown up around the black humor of it -- see BOFH definition, BOFH Sample, BOFH Complete, UK, Poland, Bulgaria, Australia, Belgium, Adminspotting, and more. How is it that an entire career field can become so thoroughly jaded?

In the financial industry, generally accepted accounting practices call for double-entry bookkeeping, a chart of accounts, budgets and forecasting, and repeatable, well-understood procedures such as purchase orders and invoices. An accountant or financial analyst moving from one company to another will quickly understand the books and financial structure of their new environment, regardless of the line of business or size of the company.

There are no generally accepted administration procedures for the IT industry. Because of the ad-hoc nature of activity in a traditional IT shop, no two sets of IT procedures are ever alike. There is no industry-standard way to install machines, deploy applications, or update operating systems. Solutions are generally created on the spot, without input from any external community. The wheel is invented and re-invented, over and over, with the company footing the bill. A systems administrator moving from one company to another encounters a new set of methodologies and procedures each time.

This lack of industry standards is due in part to a lack of academic standards. Systems administrators are generally self-taught -- there are few degree programs focused on systems administration. University computing infrastructures are generally not tasked to replicate the requirements or operating procedures of a mission-critical business environment.

This means that the people who are drawn to systems administration tend to be individualists. They are proud of their ability to absorb technology like a sponge, and to tackle horrible outages single-handedly. They tend to be highly independent, deeply technical people. They often have little patience for those who are unable to also teach themselves the terminology and concepts of systems management. This further contributes to failed communications within IT organizations.

Several years ago, some of us realized we had a choice -- stay immersed in the stagnant waters of conventional systems administration, or look for better ways. We looked.

Over the years since then, we have slowly built a set of practices, published as open standards and guidelines. These measures have been developed with the assistance of a growing List and Community of like-minded individuals. Our employers, clients, and readers have found these practices can lower the cost of providing IT infrastructure, increase data center scalability and efficiency, and make for very rapid deployments and changes.

These architectural and administrative choices revolve around bringing the concepts of automated systems administration, mass production, and mass customization to IT organizations.

We've implemented these concepts ourselves to build some of the largest financial trading floors in New York and London, to manage supercomputers at NASA, to deploy internet data centers for global network-layer providers and Silicon Valley startups, and as business continuity measures for a certain heavy equipment manufacturer. In each of these cases, these principles have proven to be crucial in cleaning up the morass of IT.

Checklist

Version Control


Gold Server
Host Install Tools
Ad Hoc Change Tools
Directory Servers
Authentication Servers
Time Synchronization
Network File Servers
File Replication Servers
Client File Access
Client O/S Update
Client Configuration Management
Client Application Management
Mail
Printing
Monitoring
Google
Search WWW Search www.infrastructures.org
Unix System Administration
[ Join Now | Ring Hub | Random | << Prev | Next >> ]
© Copyright 1994-2007 Steve Traugott, Joel Huddleston, Joyce Cao Traugott
In partnership with TerraLuna, LLC and CD International