original cas, 370 cache machines were very strong memory model and store-thru cache. stores invalidated all other caches in the complex. fetch for the compare basically did the invalidate and serialization until store completed (or compare failed).

other memory model implementations and store in-thru caches issues were suppose to still preserve the CAS semantics.

two processo r370 smp cache machine would slow the uniprocessor machine cycle down by 10 percent to allow the cache processing time buttociated with sending out the invalidates .... the base hardware of two processor smp was 2x0.9=1.8 times hardware performance of a uniprocessor (as a starting point; machine cycles for actually processing invalidates and any cache thrashing would further degrade hardware thruput).

It was submitted to a DECUS as one of the pre-DECUS paper presentation seminar proposals but turned down. I haven't managed to convince another co-author to post it. I have it...

3081 was annou nced as a "dyadic" ... two-processor smp ... but not in the sense of 360s & 370s sense where machine could be parbreastioned and be operated as multiple independent uniprocessor. 3081 was never intended to have uniprocessor version.

That's been my impression. This is a very small step. Note that this is handing out only the knowledge that had been shipped. There nothing about the thinking w.r.t...

there was some issue with ACP-TPF (operating system for airline res systems and some number of high-performance financial transactions) which had cluster support (for scalability and availability) but didn't have SMP support. Upgrading TPF to newer 3081 processor resulted in the 2nd processor being idle (other than a large number of installations that ran vm-370 on 3081s and two copies of TPF ... each one with affinity to one of the 3081 processors). A lot of the TPF customers were looking for flat-out raw performance ... and since TPF didn't have SMP support ... eventually a uniprocessor 3083 was announced (which was never planned for in the original 308x products). The 3083 processor ran almost 15percent faster machine cycle (compared to 3081 processor machine cycle) because it didn't need the 10% cross-cache invalidate slow-down.

Later mainframes (especially with higher number of processors) ... started to run the cache machine cycle at much higher rate than the processor cycles (to help mask the cross-machine invalidate overhead).

Other processor architectures might weaker memory consistency models and other cache consistency protocols .... but the hardware implementations for CAS should still support the CAS semantics.

There is that. It is also the ability of any CPU to be able to pick up where another CPU left off without having to reexecute the same code to set up the job context...

