| PLEX86 | ||
|
IBM 610 workstation computer 3451there was a separate issue-paper involving TSS-360 on 360-67 which claimed over three times the thruput on two-processor 360-67 compared to single processor. IBM 610 workstation computer 3455 On Sat, 25 Feb 2006 02:24:30 +0000, Andrew Swallow They work when you have more than one CPU too. This is a point that... my scenario was a particular workload which had heavy asyncronous i-o interrupts and causing lots of cache line replacement (high cache miss rate). some hacks with the smp support did some stuff for processor-cache affinity ... the improved cache hit rate because of two caches more than offset the degradation introduced by smp cross-cache chatter. IBM 610 workstation computer 3456 Keith The point that you are missing is that *I have seen them fail.* BAH... the 360-67 didn't have cache ... the maximum memory on a uniprocessor was one megabyte but you could double that to two megabytes in a two processor configuration. the tss-360 kernel fixed memory requirements was neary 700kbytes ... on a one mbyte sysetm, that left possibly 300kbytes ... which resulted in lot of page thrashing (and very low cpu utilization). going to two-processor system increase the memory for applications pages by nearly a factor of four (from about 300k to about 1.3m). the evidence was thruput was highly paging constrained ... since it strictly only doubled the processor thruput (didn't have the cache gimick where processor thruput is a function of both the cache hardware performance as well as the cache hit ratio). however, the tss-360 thruput was almost directly proportional to the amount of real storage available for application end. what i didn't say in the 370 uniprocessor to 370 two-processor operation was that both workloads ran at 100percent cpu utilization of all available processors. buttuming identical cache hit rate, strict hardware thruput should have only 1.8 (with an ideal smp kernel pathlength implementation) ... the additional thruput was because of some gimicks with cache hit rate because of processor-cache affinity. the tss-360 360-67 "improvement" was because the single processor thruput was highly real storage constrained and had very low cpu utilization. the two processor approx. doubled the cpu power (which wasn't the limit in the single processor scenario) but increased real storage for application end by approximately a factor of four ... which was the limiting factor. the 360-67 multiprocessor nominally had a lower mip rate ... not because of cache and cache coordination ... but the 360-67 uniprocessor had a single ported memory bus with 750ns cycle time for 8byte storage access. the 360-67 multiprocessor implementation used a multi-ported memory bus ... which slightly increase memory cycle time (and resulted in slower mip rate). IBM 610 workstation computer 3452 Using any approach requiring a global system lock seems to prevent any system from scaling to more than 4-5 CPUs available for... however, in an workload that was both heavy cpu utilization as well as heavy i-o activity ... in the uniprocess-simplex machine there was lots of memory contention between processor and i-o (resulting in reduced processor thruput). in the multiprocessor, multi-ported memory bus ... heavy i-o had much lower memory interferance contention with cpu use and-or memory contention between multiple cpus and i-o. a "half-duplex" 360-67 with a single processor had lower idealized mip rate than a "simplex" 360-67 because of the fixed additional latency introduced by the multi-ported memory bus. however, in a workload that was both cpu intensive and i-o intensive, a "half-duplex" 360-67 had higher effective mip rate than a simplex 360-67 (because of reduced memory bus contention between cpu and i-o) misc. past posts mentioning half-duplex and-or multi-ported memory: --
|
||||||||