PLEX86  x86- Virtual Machine (VM) Program
 Plex86  |  CVS  |  Mailing List  |  Download  |  Computer Folklore

Performance and Capacity Planning 731


VPN Service Provider

ok, nominal 158 was nominally one mip machine based on various kinds of avg. workload mixes and avg. measure cache hit-miss ratios.

ibm two-way processors were extremely strict memory coherency archiecture .... most of the numa architectures had slightly more relaxed memory coherency architecutre ... another posting from this thread:

Performance and Capacity Planning 735
all vamps processors ran identical microcode and instructions. in vamps, the global kernel lock metaphor ... just precluded more than one processor at a time executing in the kernel. in...

in any case, when IBM did two processor smp (on 370 cache machines) ... they slowed each processor down by 10% to allow for cross-cache chatter latency (used to maintain strong memory chherency). There might be additional cache degradation if the caches were actually broadcasting cross cache invalidates. As a result ... an 370 two-processor system was rated at basic 1.8 of a single processor system (to account for the minimum, basic delays for handling cross-cache chatter). There may be additional hardware degradation from actually running two caches spitting cross-cache invalidates back & forth at each other. Then there was most SMP kernels had additional cross-cpu chatter protocols which further limited workload thruput. Actual workload thruput on a two-process 370 smp could be expected to be 1.5 times or less than that of a one processor.

so i did a lot of work on doing SMP kernel support and one of my early test machines was a production 158AP at the consolidated US HONE complex in cal (when 158AP was first announced and shipped)

... HONE was by then the worldwide field, sales, and marketing support vehicle.

Performance and Capacity Planning 738
re: planning planning it was interesting period at the science center concurrently and-or overlapped ... i was getting to do a bunch of ecps stuff all the invention and design for vamps a bunch of...

I did some slight of hand in the SMP support ... and (from hardware monitor) was getting about 1.5MIPS out of one processor and 0.9MIPS out of the other processor (or a 2.4 aggregate MIPS). Some of the slight of hand was to schedule various parts of the workload so it ran for longer consecutive periods ... as a result had higher cache hit rate ... and therefor higher MIP rate. That was coupled with hiding and-or making the kernel cross-processor chatter almost non-existant ... so the effective workload thruput characteristics on a SMP was very close to a UP kernel workload thruput (drastically minimizing kernel overhead for operating multiprocessor configuration). lots of SMP postings

Note this was different than a TSS-360 claim from the early 70s. TSS-360 on the 360-67 was supposedly the strategic product ... and cp-67 (virtual machines, precursor to lpars, etc) from the cambridge science center

Performance and Capacity Planning 734
IIRC, the only reason JMF invented his spin lock is because KL caches were not write-thru and the other CPU had to wait for...

was an uninvited interloper (at one point when there was 12 people working on cp-cms there were supposedly 1200 people working on tss-360.

Performance and Capacity Planning 732
Planning Wouldn't this also limit your SMP to two CPUs? I can't imagine three or four interfering with each other; nothing would get...

In any case, on a uniprocessor 360-67, tss-360 was getting worse performnce support 4 interactive users doing approx. the same mixed workload (edit, complie, exec) as cp-cms was doing supporting 35 users. Part of the problem was that on a 1mbyte 360-67 ... the tss-360 fixed kernel requirements left very little for application paging.

In any case, tss-360 did a benchmark on a 2mbyte, 2processor 360-67 that should 3.8 times the thruput of tss-360 on a 1mbyte 1 processor 360-67. The result was a big claim that tss-360 had fantastic multiprocessor support ... that could make two processors run four times faster. The actual issue was that on a 2mbyte configuration, tss-360 almost had enuf room (left over after fixed kernel requirements) for executing application programs.

now along comes 3081 ... which is suppose to never have a uniprocessor version ... only the two processor 3081 (and a pair of 3081s for a four-processor 3084). the 3081 had the typical slowed down processor running at 90 percent to allow for the cross-cache chatter. However TPF (airline control program, acp, etc), didn't have multiprocessor support at the time ... and many TPF systems were already running on the largest uniprocessors available (although in cluster mode ... something like the big, mbuttive HONE complexes ... which were some of the biggest single-system-images at the time). TPF customers couldn't really utilize 3081s ... they either ran with the 2nd cpu idle ... or they ran under VM-370 ... with effectively two copies of TPF in virtual machines ... one for each real processor. Although it wasn't planned, IBM finally came out with a single-processor 3083 (primarily for the TPF crowd) ... with the 2nd processor removed and slow-down for cross-cache chatter removed ... so the processor ran at full-speed rather than 90percent.

Performance and Capacity Planning 736
the issue in cp67 was that all time was accounted for .... while in virtual machine mode ... it was all charged to "problem state" of the...
Performance and Capacity Planning 737
re: when i was an undergraduate ... i did a lot of path rewrites of stuff that i thot would likely be high-use ... as well as doing "fastpath" ... special...

there was some additional work done in the 3081 on kernel structures. prior to 3081 ... most kernels storage allication was done w-o regard to cache line boundaries. there was some analysis that if two different storage areas were allocated overlapping in the same cache line ... and two different processors were accessing the two different areas concurrently .... processor performance radically degraded. There was a big effort to re-organize kernel storage allocation so that it was cache alined and in multiples of cache lines (minimizing the cross-cache thrashing that was going on) This change is claimed to have improved overall customer thruput by five percent.

--


List | Previous | Next

Performance and Capacity Planning 732

Alt Folklore Computers Newsgroups

Performance and Capacity Planning 730