home | sitemap | abstract | introduction | chaos | thinking | checklist | migrating | recovery pushpull | cost | career | workshop | isconf | list_and_community | papers | references |
Abstract
When deploying and administering systems infrastructures it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive administration, high cost of ownership, and limited generally available knowledge or code usable for administering large infrastructures. The model we describe treats an infrastructure as a single large distributed cluster or virtual machine. We have found that this model allows us to approach the problems of large infrastructures more effectively. This model was initially developed during the course of four years of mission-critical rollouts and administration of global financial trading floors. The typical infrastructure size was 300-1000 machines, but the principles apply equally as well to much smaller environments. Added together these infrastructures totaled about 15,000 hosts. Further refinements have been added since then, based on experiences at NASA Ames, Level 3 Communications, USWEB/CKS, and various enterprises in the Silicon Valley area. The methodologies described here use UNIX and its variants as the example operating system. We have found that the principles apply equally well, and are as sorely needed, in managing infrastructures based on other operating systems. This is a living document: Revisions and additions are expected and are available at www.infrastructures.org. We also maintain a mailing list for discussion of infrastructure design and implementation issues -- details are available under the List and Community section of the web site. |
|
||||||||
© Copyright 1994-2007 Steve
Traugott,
Joel Huddleston,
Joyce Cao Traugott
In partnership with TerraLuna, LLC and CD International |