Saturday, September 17, 2016

Skip list + VRQ for 4.7

Based on the new implementation of skip list, I am adding -VRQ commits upon it. All commits will be pushed in linux-4.7.y-sl branch. Unlike previous -vrq release, I am releasing additional checkpoint tags, so users can use these tags to check their issue and narrow code changes which introducing the issue.

Checkpoint tags:
1. 4.7_0472_sl_baseline and all in one patch.
    This is the start point of all the work, all commits are in About skip list in BFS.

2. 4.7_0472_sl_new and all in one patch.
    New implementation of skip list, please check the post at New implementation of skip list for BFS.

3. 4.7_0472_sl_new_sync and all in one patch.
    Include all Sync-up commits from release to release which not yet be picked up by original BFS.

4. 4.7_0472_sl_new_gc and all in one patch.
    This patch include all former -gc commits, most important one is the "
 Full cpumask based and LLC sensitive cpu selection" commit, which help with performance under low workload.

5. 4.7_0472_sl_new_vrq and all in one patch.
    This is not the same as VRQ3 patch, but only include two major feature "VRQ solution" and "preempt task solution". I want tag a checkpoint here before the latest stick/cache code changes as it is not yet finalized.

6. 4.7_0472_sl_new_vrq_full and all in one patch.
    Include all -vrq features upto VRQ3 in 4.7.

...to be continued

BR Alfred

32 comments:

  1. @Alfred:
    With your point 4 things come back into a better shape:
    I've tested Con's last revision 0497 with addon patches, and your point 2 (4.7_0472_sl_new), then went back to the old vrq3debug3 (s1c0) and am now at 4.7_0472_sl_new_gc (omitted the plain sync patch, point 3).

    Your point 4 is a really important one, as it is not only of benefit for performance but also for interactivity IMO/ on here.

    BR, and keep up your health, you don't need to push out wonders every two days, ^^
    Manuel Krause

    ReplyDelete
    Replies
    1. @Manuel,Alfred

      There are now too much to test, I can't keep up (I'm back to day job).
      I'll definately will try point 4, as it appears to have full blown feature set from Alfred.
      Some time later this week I'll post my results to google sheet mentioned somewhere before.

      Can we exchange the .config files guys? Out of curiosity, how much different are they... We can do that via email and as Alfred knows both of our emails, he could start :)

      Br, Eduardo

      Delete
    2. @Alfred,

      what have You done? :) I tested this one (4th one):
      ---------
      BFS CPU scheduler v0.472 by Con Kolivas.
      BFS enhancement patchset v4.7_0472_sl_new_gc0 by Alfred Chen.
      ---------

      There are no issues with gaming anymore, at least what my rather limited testing shows.
      WOT - no issues so far (microstuttering is gone), Unigine benchmarks are good (on par with standard kernel and vanilla BFS), D3 has no issue with stuttering anymore, gameplay is very good.
      I would say - brilliant ;) Thanks for Your efforts!

      Unigine results are here: https://docs.google.com/spreadsheets/d/1EayezAsGlJdXjZbS3b9m7YtvtRF-DJ3xrT3hYCvfymQ/edit?usp=sharing

      I'll continue using this kernel for work machine as well as at home, will post if anything will pop out.

      br, Eduardo

      Delete
    3. @Eduardo:
      This one is really very good, isn't it? "Brilliant" is the right word to choose.
      (BTW, of course you can have my .config, just let Alfred start the "chain letter" ;-) )

      I'm at only 20h of uptime atm., but several things with point 4 are worth to notice for my system compared to both OLD VRQ3 and Con's 497 BFS:
      * load is equalized better between both cpu cores
      * base load (system+normal) is lower -- and stays lower over longer uptime than ever before, time frame of observation: several months
      * subjective impression of desktop interactivity is much better (BFS497 is worse than old VRQ3, IMO)
      * recovery of desktop interactivity in bottleneck situations (use of /dev/shm with heavy swapping) has improved greatly
      * subsequent processes' throughput suffers less than before -- in my case observed with flash stream watching within Firefox, but this does not count, as it may have other reasons like the remote server's bandwidth at the moment of testing, but it's remarkable

      Alfred, you're really doing great work!
      Thank you very much!
      BR, Manuel Krause

      Delete
    4. @Eduardo
      Let me explain it a little more. 4.7_0472_sl_new_gc includes my first batch of patches I have added upon BFS, the most important one is the "Full cpumask based and LLC sensitive cpu selection" commit, all these patches are used to be in -gc branch if you look back at previous kernel releases. And it is considered very stable.

      The test title in your Unigine results is not correct as it haven't include VRQ patches yet in this checkpoint tag #4.

      The current -vrq branch includes a lot more enhancement beside the above -gc patches, but the latest improvement about stick/cache is not yet finalized. That's why I need the debug patches to test which direction to go.

      I have finished putting all -vrq commits upon this -sl branch, and running sanity tests. As I said above, the latest code is not yet in best shape, so I'll pick up checkpoint tag carefully to reflect this fact. :)

      Delete
    5. @Alfred,

      I'll change the title then, since the topic was 4.7 + VRQ I sort thought it's all about that.
      Anyhow this one is very good.
      Thanks.

      br, Eduardo

      Delete
  2. Without any negative attitude, looks like the BFS 497 code doesn't seem to have really settled for now, if you look at:
    http://ck.kolivas.org/patches/bfs/4.0/4.7/Pending/
    and
    http://ck.kolivas.org/patches/bfs/4.0/4.7/Testing/

    @Alfred:
    If I've read your comments carefully enough, you don't completely follow Con's BFS 490/497 way. Are there any of the fixes in the folders above, that may be of interest?
    How much would they differ at the end of the road, the BFS and the future VRQ?
    Fortunately, I never suffered from any related issue posted on Con's blog. But your answer may be interesting, even the result lies in future.

    BR, Manuel Krause

    ReplyDelete
    Replies
    1. @Manuel,Alfred,

      When inspecting my testing results w/ Unigine check "ondemand" governor results. They show that "ondemand" is not really that great. All BFS versions (w/ or w/o VRQ stuff) suffers from not-that-good "ondemand" results with notable exception of VRQ3 (and of course standard Ubuntu kernel) which perform rather good.

      Pending and Testing directories in Cons "repo" mostly contains patches which try to address this issue with cpufreq / load signalling, I think this is very good to have ideas / patches / improvements already ready, to me that seems more like finetuning.

      Benchmarks mostly are throughput oriented measurements, therefore I'm not sure we can take my benchmarks very seriously because whole point of BFS + Alfred's work is to improve _responsiveness_ and _interactiveness_ of the system not sacrificing throughput a lot. But how much is "a lot", I don't know :)

      I'm looking forward to "... to be continued" article and Alfreds thoughts on this, if has time ;)

      br, Eduardo

      Delete
    2. @Eduardo, Alfred:
      Doesn't the wording of "ondemand" already mean, that it's providing xyz (cpu, full clocking, responsiveness, bandwidth, etc.) on _demand_ and this means -> not always? I don't know what you expect.

      I'm at 'performance governor' always on here and won't do any change to it, as keeping results of the last needed patch revisions' tests in mind is quite a hard job. ;-)

      Your benchmarks are very different from Alfred's ones and so it's good to have them mentioned here, so he can decide better, which way to go in future. Thank you!

      In the past I wished silently, Alfred's work would go for more interactiveness. But no need with the current one. Point 4 is the optimum since many months.
      Alfred also always considers his performance measurements before releasing something. At least, that's what he wrote some weeks ago.
      I do only see what's happening on my desktop and how fast/slow things go. With the current revision it's as good as it hasn't been for long time.

      BR, Manuel Krause

      Delete
    3. @Manuel
      Yes, currently, just the skip list related patches are picked up and I have rewritten most of it. All patches in 480~497 are valuable but consider the new BFS code is not settle down yet, "one thing at a time" is the golden rule.
      I don't know how to answer your question about "How much would they differ at the end of the road, the BFS and the future VRQ?"
      Let's see what would happen when picking up other commits in 480-497. For example, "stick" has been removed in 497 b/c sched to cpufreq ineracting, but I think testing is needed before making the move in VRQ.

      Delete
    4. @Eduardo
      For the "ondemand" governor performance regression, do you remember the low workload performance fix in VRQ2? I think it also help with ondemand governor, that may explain why VRQ3 do much better than others. Let's find it out in the incoming checkpoint tags.

      Don't worry about your benchmark, thanks to provide it as different benchmark look at the codes in different way, and it helps to view it in a full picture. I won't decide how code goes based on a single benchmark, instead, it is balanced result of a few things.

      To be honest, the compilation tests is first test I have, you can check my previous sanity test result for elder releases. -gc and -vrq show better throughput than original BFS and even the mainline CFS(in some workload cases).

      The gaming benchmark, IMO, it's not a pure through put benchmark, it is more like a interactivity tests. No stuttering and playable > benchmark score, right?

      Delete
  3. @Alfred,

    tbh I have heard of the low workload fix, but it didn't affect me, I think. I sorta avoided it, it didn't affect me much or my memory does not serve me well :)
    If low workload fix would help to ensure better "ondemand" behaviour then it's good. When playing a game with nothing else in the background, a scheduler, at least how I understand that, does not have many things to do, there is just one big resource hog - a game, little tasks in the background are little tasks in the background which require some attention, but not much. I might be wrong here, please correct if needed :)

    I started to measure performance of games for myself, which I do not play a lot, but there are some select titles I do, occasionally. That was a measure to know whether I can rely on that particular kernel for games (and how it compares to totally standard kernel) which I play only on desktop (AMD), at least it's measurable. On laptop (all Intel) it's all unclear and soooooo subjective that I don't talk much about gui speed, reaction time to clicks, gut feeling etc. I can not measure interactiveness reliably as I don't do measurable things on it, just writing docs, working with SOA, all things Oracle, etc. "Feeling" about speed is rather relative in my eyes, if something works slowly right now it does not mean that just installed kernel is not up to job, maybe I'm just upset and angry and impatient or just having a bad day, etc. :)
    There are times when speed it's all over the computer, battery life is good and performance and interactivieness, etc, then it's a kernel for me. And last time that happened with "v4.7_0472_sl_new_gc0 by Alfred Chen" ;) Thanks!

    "No stuttering and playable > benchmark score, right?" - well it is with the note that framedrop is not large and stuff is responsive :)

    br, Eduardo

    ReplyDelete
  4. @Alfred: blogspot ate my last comment. Can you please check your inbox? Thx, Manuel Krause

    ReplyDelete
    Replies
    1. @Manuel
      Copy your previous reply here


      @Alfred:
      >Regarding point 5. "4.7_0472_sl_new_vrq": I'm observing a lagging mouse pointer from time to time, not related to high i/o. Testing is now at almost 48h. Although I'm not having a "bad hair day" today, I want to admit that it's not obvious how to reproduce it.
      >Looking at the long row of commits that went into 'point 5' since 'point 4', I don't know what commits to eventually revert for further testing. Please, advise me in detail, if you find it useful.

      >BR, Manuel Krause

      Delete
    2. @Manuel
      Ok, here is patch for you to try on(https://bitbucket.org/alfredchen/linux-gc/downloads/v4.7_0472_sl_new_vrq_only.patch), which reverted the preempt task and other features in #5 patch, include only 4 vrq related commits.

      BR Alfred

      Delete
    3. @Alfred:
      Thank you for taking care of this. Patch is in testing for some minutes now. (So far, nothing related to be noticed.) I'll come back whenever the issue shows up earlier -- or in 24h with an update regarding it.

      BR Manuel Krause

      Delete
    4. @Alfred:
      Patch "v4.7_0472_sl_new_vrq_only" does behave as well also with much i/o as reported for #point 4 on here at 'September 20, 2016 at 10:40 AM', readable above. Due to this fact, although testing time is very short until now, I consider it very unlikely that the issue is introduced within these 4 commits.
      If you'd find time, please advise me to the next testing step (adding vrq commits/ maybe another patch from yours) in the meantime. I'd continue using this one until you say: "goto next".
      Maybe we can shorten testing time in this case, as I also hear kernel 4.8 knocking on the door.

      BR Manuel Krause

      Delete
    5. @Alfred:
      O.k.: Need to change my judgement: Instaead of the mentioned occasional mouse pointer lags that I don't see now, I now get noticeable frame drops in plain .avi video playback in mpv via smplayer, both video+sound, from local disk. They're not related to heavy i/o. And I haven't seen this symptom for a long time. (System's software & kernel .config kept the same.)
      With the recent 502 update from Con I also tried a different .config with his patch. That, only for that trial, had the goal to remove many background but cpu intensive things from my kernel. Now, I don't know, if I should wait for your next testing step -- or re-test this .config also for the most recent 'almost-full-feature' VRQ #point5, as-new.

      Dunno.
      BR Manuel Krause

      Delete
    6. @Alfred,

      I was on point 5., 4.7_0472_sl_new_vrq+BFQ+WBT, on my desktop system it appears to be running quite well, but on laptop it's broken.
      After I boot (using Unity) and minimize chromium or other app, ibus processes go wild and screen is not redrawing (even clock stops), I can't do anything in GUI. Restarting session (login greeter + X server) does not help at all, it won't start second time. Nothing in the logs (surprise :) ).
      I assume this is a bad combination of this kernel and intel driver. I tried it on Ubuntu 16.10 and Ubuntu 16.04, the same situation, booted 5 times each, GUI hangs up.
      I can switch to terminal just fine though. The end result is that it's not usable for me. None of other kernels shows this issue. Now I use BFS502+BFQ+WBT, all seems to be good.

      Will try next patch when You'll make it available.

      br, Eduardo

      Delete
    7. @Alfred & @Eduardo:
      Dunno, if this is related to vrq. I encountered X session's login to not appear on bootup, KDE on here, quite often, too. Just gets back to console.
      But I blamed it to new/ old Xorg drivers. If I reboot for approx. two times, then it'll come up in a good shape again.
      The BFS502 doesn't show this behaviour on here.

      BR, Manuel Krause

      Delete
    8. @Manuel
      How does the mpv avi playing with #5 patch? The v4.7_0472_sl_new_vrq_only.patch, which implement VRQ but no further improvement addon it, so in some code patch, there is overhead(need two spinlocks) than nonVRQ version.
      @Eduardo
      OK, the most important thing I want it to be tested is the D3 game playing with #5. Which help me understand the D3 game playing behavious and decide how to fix it in full-vrq featured patch.

      For the X issues, it's more likely the compatibility issues, software(both kernel and userland) are compatible with mainline CFS scheduler, but may have issues with BFS, and even more issues with VRQ b/c BFS/VRQ are colser at the edge than the CFS and trigger some situations what never be triggered in CFS.

      My suggests is let's see how the compatible issue at the final VRQ, don't waste time on middle checkpoint tags. If it still there, try find a workaround(like buildin intel module or build it as module and module options etc)

      BR Alfred

      Delete
    9. @Alfred,

      I'll try find workaround as time permits or just use bfs502 for whole week on laptop, don't want to restart it really during the week :)

      I'll run D3 this evening which is CET time here.

      Br, Eduardo

      Delete
    10. @Alfred:
      To not lead this to nonsense output from my side, I've retested all the patches to reassure myself of my (previous) results.
      The v4.7_0472_sl_new_vrq_only.patch leads to mpv-in-smplayer drops in certain cases. I've additionally found out that these cases happen with KDE-desktop-mouse-interaction after some period of inactivity. Can even be the on-here-normally-hidden task bar being used again.
      In some way, this corresponds to the mouse-pointer thing that I've described for the #5 4.7_0472_sl_new_vrq.patch earlier, but with the latter, it doesn't lead to video playback drops - only to subtle mouse lags.
      {Maybe, this is the same issue you've fixed with a short patch in the older, pre-BFS480, VRQ2-VRQ3 development phase?!}

      So, my conclusion is atm. for your ongoing further full-feature-VRQ development: Take Eduardo's findings as more serious than my humble subjective observations (even if they prove your given explanations above).

      I'm looking forward to the final VRQ, wish Happy Developing to you
      BR Manuel Krause

      Delete
    11. @Alfred,

      I tested D3 with v4.7_0472_sl_new_gc0 (it appears that point 4. and 5. seems to have the same version reported in logs). No slowdowns!
      Out of curiosity I tested old 4.6.3+test5+debug2 (I still have it installed :) ) just to test whether system updates fixed the problem or kernel. Results were bad, as reported before.
      This means that new versions of Your patches have improved/fixed strange behavior with D3!

      P.S. Other observation, which is not related to gaming or CPU scheduler directly, is that new versions boot my old crusty HW quite faster than 4.6.3 :)

      br, Eduardo

      Delete
    12. @Eduardo:
      These patches have the same naming in kernel logging unfortunately.
      But did you really test
      https://bitbucket.org/alfredchen/linux-gc/downloads/v4.7_0472_sl_new_vrq.patch
      versus
      https://bitbucket.org/alfredchen/linux-gc/downloads/v4.7_0472_sl_new_gc.patch

      Please, recheck, otherwise we're back to start without your results.
      Thank you very much for your testing!!!

      BR Manuel Krause


      Delete
    13. @Eduardo
      Yes, #4 #5 has same version printed in dmesg, but they are different. Please really test #5.

      Delete
    14. @Alfred, Manuel,

      I wouldn't post if I weren't sure ;)
      But I doublechecked and yes I was running vrq (#5). I always leave last compilation as it is after compilation, so I was able to check code in build tree.
      It would be good to have unique version printed, though, I imagine You just forgot to change this minor thing in this version :)

      Br, Eduardo

      Delete
    15. @Eduardo
      Thanks, it really helps. I'd release an other checkpoint tag in hours, which include the latest code in -vrq, but likely it will bring back your D3 playing issue, if so please apply debug0 patch upon it and see how it goes.

      BR Alfred

      Delete
    16. @Alfred,

      You're right, just tried D3, stutters a _lot_. Will try debug0. Btw, can You please direct me to right debug0 patch, I'm a bit lost in all those patches right now.

      br, Eduardo

      Delete
    17. @Eduardo:
      I also got a little confused about the debug patches' numbering. It's most likely, that Alfred refers to debug patch #1, as described in his posting {http://cchalpha.blogspot.de/2016/08/47-debug-patches-call-for-testing.html}, what is the s0c0 variant, turning both stick and cache off and so is supposed to increase interactivity. I can confirm this effect from my testing.
      Link to patch: https://bitbucket.org/alfredchen/linux-gc/downloads/4.7_vrq_test1_debug_s0c0.patch
      Alfred, please correct me if I'm wrong.

      BR, Manuel Krause

      Delete
    18. Yes, Manuel, you are correct.
      I'd like to use the patch name like s0c0 as it's easy to know what the patch has done, but looks you guys are more likely to use numbers, :)

      Delete
  5. O.k. patch #6 "4.7_0472_sl_new_vrq_full" is out since two days, and it's in testing on my machine since that afternoon (in my timezone).
    Unfortunately, my machine had a 'bad hair day' (what a picture ;-) ) within the first 24h with this patch, so I relatively early switched to adding "debug0" upon it. In my overall experience it now behaves better, not only regarding interactivity IMHO, comparable with #4 gc.

    But it's really too subjective atm., so please wait for other testers' results.
    Just wanted to let you know, that I haven't abandoned gc/VRQ testing.

    BR, Manuel Krause

    ReplyDelete