| PLEX86 | ||
Crash detection by OSToday's mainframeanything to new 553 All sessions to the machines in question are via ssh and a pretty paranoid vpn based on OpenBSD boxes. This... Del Cecchi processor crash for ha-cmp besides distributed lock manager, we had to do cluster membership management ... removal of member on possible failure as well as re-integrating a member back into the cluster. we were able to draw on decades of experience with mainframe clusters ... as well as some of the vax-cluster history. for the distributed lock manager, we collected some information from some of the DBMS vendors that had done vax-cluster based implementations ... and had their list of things done wrong in vms. we had the advantage of a clean slate and starting from scratch. May 6 1955: First disk storage demonstration On May 6, 1955, IBM publicly demonstrated its new invention, disk storage. The disk drive would hold... for cluster membership management there was heartbeat timer (another name for watchdog) mechanism. a problem then becomes shooting down a cluster member that is believed to have failed. the long-time model has been mainframe loosely-coupled (cluster by any other name) where different processors had shared concurrent access to the same disk drives (dating back to the 60s). the age-old scenario that you try to handle ... is somebody pushes the processor "STOP" button on the front-panel when the operating system is one instruction in front of a disk start i-o operation instruction. the rest of the cluster eventually decides the affected processor has failed ... goes into recovery and take-over ... and removes the "failed" processor from the active cluster membership ... and possibly re-buttigns responsibility for resuming processes that were in progress on the failed complex. the scenario then has the "START" button pressed for the "failed" processor and it proceeds to the disk start i-o operation. allowing the succesful end of such a disk start i-o operation may obliterate valid information that has gone on while the processor was in the stop state. For this scenario ... the cluster mechanism needs a methodology that fences off processors that are believed to have failed (i.e. removed from standard cluster membership). misc. old ha-cmp reference: some number of past posts on distributed lock manager (or DLM ... which is unrelated to recent posts concerning DRM): outer cylinders still faster than inner cylinders? Public Key Exchange without Digital Certificates relational data (was: Re: CAS and LL-SC (was Re: High Level buttembler for MVS & VM & VSE))
|
||||
May 6 1955: First disk storage demonstration Alt Folklore Computers from Newsgroups The #1 Usenet Provider on the Internet
|
||||