Tuesday, July 19, 2016

Heads up! Performance regression over several release

Performance regression over several releases from 4.3 to 4.6 is observed,under low workload(50% workload) on smt machine(without SMT_NICE config)

Just re-test the 4.3-vrq kernel, 4.5-vrq and bfs/vrq/test on 4.6, the sanity tests show that major performance regression occurs, saying 20~30 more seconds in a 7mins test. The pure bfs of 4.6 is impacted badly, about 50% more time taken(11mins).

More sanity tests for different versions of kernel during 4.3 to 4.5 are still on the way to find out where to looking at firstly. There could be more than one cause for the regression based on the sanity result current have. I will keep you updated.

BR Alfred

Updates:

After investigation,  there are three factors contribute to the regression.

1. Regression from release to release, there are minor regression recorded from 4.4 to 4.5 and 4.5 to 4.6 using default CFS setting. Unfortunately, nothing could be done from BFS/VRQ scheduler's perspective.
2. Interactivity default on setting which is introduced in BFS 0466(4.5), which contribute about 50% of the regression.
3. cpufreq_trigger() API introduced in 4.6 was not properly deployed in BFS0470. I have a debug load to improve this API deployment,  but it's not  robust enough to go public.

As 4.7 was released this week, I'd like to address this regression issue in 4.7. There are major changes about scheduler code in this release, and it will take longer to port/sync the codes. Good news is it's about 25% finished, the first bfs 4.7 kernel is up and running.

4 comments:

  1. Hi Alfred,

    I can confirm that there is some sort of regression (+/- smtnice):

    kernel compilation usually takes 7-8 minutes (I've a pretty monolithic kernel + lots of modules, so it takes so time), but occasionally it takes 14-18 minutes (so TWICE that long) for no observable reason whatsoever, out of the blue

    One factor was that ondemand governor wasn't clocking the cpu up it always stayed at 800 MHz (compared to 3.4 GHz and Turbo, if applicable, perhaps Turbo also didn't work that way),

    therefore I'm currently forcefully switching the governor to "performance",

    also there appears to be a regression with intel_pstate currently going on with 4.6 or 4.7, so went back to cpufreq governor


    So please make sure that the cpu also does not suffer from being clocked down for no reason

    I'll wait for 4.6.5 to come out before I'll give your new test5 patch (or until then perhaps test6) a try,

    instability turned out to be a weird stability issue with ZFS with my use-case,

    vrq v3 looks stable so far; could you please keep the repo at github also up-to-date like the one at bitbucket ?

    converting tags into patches is great but I'd prefer to use BFS solely decoupled from the other patches (BFQ, etc.) and have those e.g. in separate branches


    Thanks

    ReplyDelete
    Replies
    1. @kernelOfTruth
      Thanks for reporting. Just give you some quick reply, investigation is still on going.
      I notice similar cpu clocking at 800HMZ problem on my 7*24 server too since 4.6, but I though it was caused by overheat issue like it used to be, as I haven't clean dust on its fan for months. But when it's on 800HMZ and the sensors doesn't report high temperature as they should be, so I am maybe wrong about this and have to take it as an issue.

      intel_pstate driver doesn't work good to me all the way.

      There will be no test6 patch, as all my bandwidth is on the regression issue. For the regression itself, the default interactive mode introduce in bfs 0470 contribute part of the regression, I'm still looking at if there is any causes.

      BR Alfred

      Delete
  2. At the moment that I noticed the troughput drop in my WCG clients with recent 4.6 VRQ + testX patches, reported earlier on this blog, I've then given the VRQ0 a try for some days again. {This VRQ0 was based on BFS 0469 with only 4.6 syncup changes, if I followed history correctly.}
    For my simple everyday's usage scenario, kernel didn't perform better or worse than e.g. current VRQ2 test5, also regarding interactivity.

    And, regarding the WCG clients' throughput count: It's not a trustable value for BFS/VRQ/anything's performance measurement. Limiting it to one project's client (first idea) still would not reflect, that Firefox uses more CPU time over it's running time vs. uptime etc.
    I wondered about the drop, and during last days, by coincident it went up by ~10% over the old average. Never mind for bothering with that.

    SCHED_SMT is OFF on my machine, after we've found out together, some years ago, that my Dual Core Intel chipset would not support it, although the two CPU cores ship the ht flag. (So I also don't make use of SMT_NICE.) My governor is also "performance" for many months now, as my notebook usually doesn't run on battery.

    Maybe this info is useful.
    BR, and good luck for your regression hunting,
    Manuel Krause

    ReplyDelete
  3. Thumbs up !

    Great to hear, that you could track down most of the parts of the regression :)

    ReplyDelete