IBM 610 workstation computer 3449
two-way 370 processor adds cache overhead such that a two-way 370 is considered to be at best 1.8 times the performance of a uniprocessor. part of the issue is that the 370 caches are slowed down 10percent to allow for cross-cache chatter in support of cache coherency. any cache thrashing invalidates would further degrade the smp thruput compared to uniprocessor.
using the 1.8times hardware thruput rule-of-thumb ... actual system thruput was frequently pegged at 1.5times ... because of additional kernel overhead managing smp environment.
IBM 610 workstation computer 3450
ref: the problem was that with one cpu ... the cache ran at full-speed. with two cpus...
the orignal smp software version adapted from vamps ... turned out to require minimal number of changes to a uniprocessor kernel, almost zero lock contention in normal operation and managed close to the 1.8 times thruput (having almost zero incremental smp software overhead). there were even a couple cases of greater than 2times thruput (with some funny situations involving improved cache hit ratios compared to uniprocessor).
IBM 610 workstation computer 3454
KR Williams I will describe a machine I was programming in 1993. The main processor was an Intel 8080 which controlled the applications hardware via its 8 bit ports. One...
this changed with a rewrite of the code for sp1. with the 3081, there was no longer going to be a uniprocessor option. one of the major operating systems was TPF (transaction processing facility, the renamed airline control program used heavily by airline reservation systems ... but also starting to see a lot of use in financial transaction networks).
the problem was that the TPF didn't have smp support ... and the generation of computers was smp only. you could take 3081, bring up vm370 on the machine (with smp support) and run tpf in a single process virtual machine (under vm370). the issue in a dedicated TPF environment the 2nd 3081 processor would be idle most of the time.
part of the issue was that standard virtual machine operation ... involve the virtual machine end and then when there were various kind of privilege operations which then be handled by the vm370 kernel. this processing was serialized for a single virtual machine (alternating virtual machine end with kernel end ... multiple processing end was normally achieved by having multiple processor virtual machines and-or having lots of independent virtual machines).
IBM 610 workstation computer 3453
this is somewhat dependent on the traditional spin-lock approach to global system lock. this was...
so a gimick was created to re-organize the multiprocessing implementation that would enable asynchronous end of TPF virtual machine with kernel code executing stuff on behalf of the TPF virtual machine. This reworked increased the overall smp kernel overhead by about ten percent of each processor ... but it allowed the 2nd 3081 processor to get about another 50precent thruput in the single, dedicated TPF operation.
the issue was that the dedicated TPF operation was only a small precentage of the multiprocessor customers ... but the new kernel rework took an additional ten percent of every customer smp processor (reducing their overall thruput ... except for the small number of dedicated TPF 3081 installations).
lots of past posts mentioning smp and compare&swap
eventually a 3083, uniprocessor was announced for the TPF market. this was a uniprocessor version of the 3081 ... with the smp cache slow-down removed.
misc. past posts mentioning 3083:
all sorts of past posts mentioning TPF or ACP
IBM 610 workstation computer 3452
Using any approach requiring a global system lock seems to prevent any system from scaling to more than 4-5...