| PLEX86 | ||
The Pankian Metaphor 3128re: lots of hard work and experience. one of the things done was detailed vulernability analysis of tcp-ip ... looking both at standards documents (RFCs) and code examination. for a little drift ... see my IETF RFC index identified several operational things ... i.e. not coding bugs ... design-implementation problems that could result in live operational failures. having studied both standards and code from the standpoint of detailed vulnerability analysis help later ... story about problem that the largest (at the time) online service provider was experiencing: part of the experience was having developed an online failure diagnostic tool in the early 80s The Pankian Metaphor 3130 On Sun, 14 May 2006 12:17:16 -0600 in alt.folklore.computers, Anne & I've noticed that American companies seem... that attempted to also build a library of failure signatures that could be automatically scanned for (as part of the diagnostic process). this also evolved into looking for common failure features and-or characteristics for grouping types of failures. this was widely deployed thru-out the corporation ... both for internal datacenters and people responsible for shooting customer problems. The Pankian Metaphor 3129 for a little drift, sometime after we took the early out in 92, one of the major airline res systems were talking to us about the ten impossible things... also part of the experience as based on having redone the I-O supervisor for the disk engineering and product test labs in bldg. 14 & 15 they had development hardware that operated in very strange ways and potentially generated more errors in a few minutes that normal production devices would generate in a year. it was a very operating system hostile environment. attempts at using a standard MVS system in that environment resulting in 15 minute MTBF testing just a single development device. I had to completely rethink the whole operating system approach with buttumption that it was operating in an extremely hostile environment ... and eventually produced a bullet-proof operating system where they could concurrently test half-dozen or more development devices. also in that time-frame we were enhancing the HONE time-sharing service (internal time-sharing services that provided world-wide support for all corporate marketing, sales, and field people). all the US hone datacenters had been consolidated in northen cal. in the mid-70s (and there were alos a growing number of cloned datacenters world-wide). there was a lot of work that went into attempting to make the HONE service continuous. After a couple years, because of evironmental concerns (earthquakes), the US HONE center was replicated first in Dallas (and then a 3rd in boulder) with fall-over and load-balancing across the distributed centers. in the same period, there was similar work on high availability by various other corporations providing commercial time-sharing services using the same platform --
|
||||
Alt Folklore Computers from Newsgroups The #1 Usenet Provider on the Internet
|
||||