One or two CPUs the pros & cons 3818
couple previous postings in this thread cons cons
minor topic drift, for a long time the corner stone of SMP operation was compare-and-swap instruction. at the science center
charlie had been working on SMP efficiency and fine-grain locking with CP67 on the 360-67. He invented the compare-and-swap instruction (mnemonic chose because CAS are charlie's initials). the first couple trips to POK trying to get compare-and-swap into the 370 architecture were not succesful. we were told that the mainstream POK operating systems didn't care about CAS ... that they could perfectly well get by with TS (test-and-set). In order to get CAS included in the 370 architecture ... a non-SMP application for CAS would have to be created. Thus was born the descriptions about how to use various flavors of CAS in enabled, multi-threaded application code (whether running on single process or SMP, multiprocessor configurations). The original descriptions were part of the instruction programming notes ... but in later principle of operations were moved to the appendix. misc. past posts on smp, compare-and-swap, scale-up, etc
tightly-coupled tends to buttume extremely fine grain communication and the coordination overhead reflects. loosely-coupled tends to have much courser grained coordination. given that your workload can accomodate courser grained coordination ... a few 20-processor complexes in a loosely-coupled environment ... may, in fact, provide overall better thruput than a single 60 processor operation (where the incremental benefit of each additional processor may be getting close to 1-3rd of a single processor by the time you hit 32 processor configuration).
we saw that in the late 80s when we got involved in both fiber channel standard effort as well as the scalable coherent interface standard effort.
Mainframe Limericks... 3824
os-360 ... pcp. i don't remember that you could "sysgen" mvt until release 12. boeing huntsville had custom modified mvt version 13 with virtual...
FCS was obviously a loosely-coupled technology ... which we worked on when we were doing ha-cmp
One or two CPUs the pros & cons 3819
Ted MacNEIL the redbook "effective zseries performance monitoring using resource measurement facility" gives LSPR ratios for lots of stuff. for mixed-mode workload, 2084-332 is around 20 times the thruput of 2084-301; the 2084-302...
also minor reference here
One of the engineers in austin had taken some old fiber optic communication technology that had been laying around POK since the 70s (eventually announced as escon on mainframes) and did various tweaks to it ... got it running about ten percent faster effective thruput, and adapted some optical drivers from the cdrom market segment that were less than 1-10th the cost of the drivers that had been defined in POK. This was adapted for full-duplex operation (simultaneously full bandwidth transmission in both directions) and released as SLA (serial link adapter) for rs-6000. Almost immediately he wanted to start on a proprietary version of it that would run 800mbits (simultaneously in both directions). Since we had been working with the FCS standards operation, we lobbied long and hard to drop any idea of doing a propriety definition and instead work on the FCS standard (1gbit, full-duplex, simultaneously in both direction). Eventually he agreed and went on to become the editor of the FCS standards document.
SCI could be used in purely tightly-coupled operation ... but it had a number of characteristics which also could be used to approximate loosely-coupled ... and then there were the things in-between ... for NUMA (aka non-uniform) memory architecture.
SCI could operate as if it was memory references ... but provide a variety of different performance characteristics (somewhat analogous to old 360 LCS ... where some configurations used it as extension of memory for standard end and other configurations used it like electronic disk .... more akin to 3090 extended store).
sequent and dg took standard four intel processor shared memory boards ... and configured them on the 64-port SCI memory interface for a total of 256 processors that could operate as a shared memory multiprocessor.
Strela: The First Supercomputer
Technology in the Soviet Union, especially in 1953, was significantly behind that in the United States. Thus, it is hard to believe that the Soviet Union made the world's first supercomputer (even...
convex took two HP processor shared memory boards ... and configured them on the 64-port SCI memory interface for a total of 128 processors that could operate as a shared memory multiprocessor.
while background chatter for sci is very low ... actually having a lot of different processors hitting the same location constantly can degrade much faster than more traditional uniform memory architecture. at some point the trade-off can cross.
so parbreastioning can be good ... convex took and adapted MACH for the exemplar. one of the things they could do to cut down fine grain coordination scale-up issues is parbreastion the exemplar into possibly 5-6 twenty processor shared memory multiprocessor ... then they could simulate loosely-coupled communication between the different complexes using synchronous memory copies.
this was partially a hardware scale-up issue ... scaling shared kernel that was constantly hitting same memory locations from a large number of different real processors ... and partially using parbreastioning to manage complexity growth. this is somewhat like LPARs are used to parbreastion to manage complexity of different operations that may possibly have somewhat different goals ... which would be a lot more difficult using a single system operation.
for other historical topic drift ... MACH was picked up from CMU ... someplace that andrew file system, andrew windows & widgets, camelot, etc had come out of. In this period there was Project Athena at MIT ... jointly funded by DEC and IBM to the tune of $25m each (from which came Kerberos, X, and some number of other things). While IBM funded CMU to the tune of $50m. Mach was also picked up at the basis for NeXT and later for apple operating system (among others).
LANL somewhat sponsored-pushed HiPPI thru standards organization (as standard of Cray's copper parallel channel). LLNL somewhat sponsored-pushed FCS thru standards organization as a fiber version of a serial copper connectivity that they had deployed. And SLAC somewhat sponsored-pushed SCI thru the standards process.
misc. old posts mentioning HiPPI, FCS, and-or SCI new? Write-update versus write-invalidate