PLEX86  x86- Virtual Machine (VM) Program
 Plex86  |  CVS  |  Mailing List  |  Download  |  Computer Folklore

IBM 610 workstation computer 3450


ref:

the problem was that with one cpu ... the cache ran at full-speed. with two cpus, the cache ran at full-speed ... but its use by local cpu was slowed down by 10percent to accomodiate cross-cache chatter coming from one other cache (as part of cache coherency) ... basically the overall processor machine cycle ran at 10 percent slower.

with four cpus ... each cache was further slowed down to accomodate cross-cache chatter from three other cpus. by the time they got to 3090, the cache technology implementation had to use technology that ran significantly faster than processor machine cycle ... in order to mask any degradation caused by the mbuttive amounts of cross-cache chatter.

now that was just the basic cache slow-down to accomodate cross-cache processing ... any cache slow-down involving actually invalidating cache-lines from signals coming from other caches was over and above base machine cycle.

part-way into the 3084 (4-way) processor time-frame there was significant projects to restructure most of the major kernels to carefully cache align kernel data structures in order to minimize cache-line thrashing (one case was different data structures overlapping in the same cache line). this cache-line data structure reorganization is claimed to have resulted in 5-6 percent overall system thruput improvement (storage alterations was still causing the cross-cache cache-line invalidates to be broadcast, but the same cache line was much less frequently being subject of concurrent use by multiple processors).

IBM 610 workstation computer 3452
Using any approach requiring a global system lock seems to prevent any system from scaling to more than 4-5 CPUs available for useful work. +--------------- Yup, exactly. That's when you have to start spending serious bucks...

the heavy penalty paid by 370 multiprocessor cache implementations for extremely strong memory consistency ... was one of the reasons that i've claimed that 801-risc went to opposite extreme .... and you didn't even find hardware cache consistency between the separate I(nstruction) and D(ata) caches on the same processor. this also manifested it in system "program loaders" requiring new (software) instruction to force changed cache-lines from the data cache to memory (so that program instruction memory areas that the program loader may have been treating as data and altered ... were forced to memory ... so the alterations would be picked up by the instruction cache when the loaded program started running). Basically the system program loader is a special case where instruction memory areas would be treated as data areas and modified.

this slightly drifts into the thread posting on buffer overflow exploits. it becomes slightly more problamatical for architectures with separate I & D caches (with no automatic cache consistency). any storage alterations that are resident in the (store into) data cache may take some time before appearing in memory and be visible to the instruction cache.

IBM 610 workstation computer 3451
there was a separate issue-paper involving TSS-360 on 360-67 which claimed over three times the thruput on two-processor 360-67 compared to single processor. my scenario...

--


List | Previous | Next

IBM 610 workstation computer 3451

Alt Folklore Computers Newsgroups

IBM 610 workstation computer 3449