Sunday, October 22, 2017

PDS 0.98c release

PDS 0.98c is released with the following changes

1.Refine __sched_setscheduler().
2.Task deadline catch-up algorithm V3, which just apply catch-up algorithm for NORMAL policy tasks.
3.Adjust next_balance value and Fix task balance with low HZ system. (Task policy fairness imbalance issue reported by Manuel still under investigation)
4.Set default yield_type to 0. Help with wine running which use yield APIs. Yield support in PDS will be removed if no complains.

This is a bug fix release, hopefully it helps with compatibility and stability.

Enjoy PDS 0.98c for v4.13 kernel, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.13.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.13.y-vrq

 All-in-one patch is available too.

41 comments:

  1. @Alfred:
    I'd like to wait with trying this release until you find a nice solution for the "task policy fairness imbalance issue". TIA for this. So far, I'm quite content with the mentioned revert setup from last blog entry. I hope my explanations were clearly enough.
    Please tell me, if you need other info/ debugging or like me to try preliminary test patches or such.
    BTW, since the discussion of sched_yield in last blog entry I've also switched to =0 and don't see negative effects. But maybe you can keep the interface in the code for a while, especially if this interface itself doesn't do any harm. IIRC, the older BFS code from Con had followed the =2 approach.

    BR, and many thanks for your work,
    Manuel Krause

    ReplyDelete
    Replies
    1. @Manuel
      For your original setting, 250HZ and 6ms default rr_interval, the sched_balance_interval is set to 0, so it would be the same as revert the next_balance related changes.
      For 1000HZ and 6ms rr_interval setting, sched_balance_interval is set to 2/3 of rr_interval, that's 4ms.
      My testing result of above setting both come out as expected. So I encourage you to have a try for this release.

      Delete
    2. @Alfred:
      I had been running the previous kernels at exactly 512HZ by a hand made patch, originally inspired by Con Kolivas' writings of possible micro-optimisations in the code (that Con discarded relatively quickly).

      Now, that you encourage me to test the new PDS patch, I'd do that. But I'm currently at 1000HZ and go o.k with it. Should I really go back to 512HZ to verify the usefulness of your "pds: Fix task balance with low HZ system." commit? If I understand your code correctly, it's an exclusive one for cases that do or don't match your targeted HZ.

      BR, Manuel Krause

      Delete
    3. @Alfred:
      No advantage with PDS 0.98c for me.
      Kernel compilation -j2 sticks at cpu1, IDLE and some very few other NORMALs at cpu0.
      And this is with 1000HZ. I go back to the previous kernel, with reverted patches setup, until it gets equalized.

      BR, Manuel Krause

      Delete
    4. And in the second compilation attempt (still at PDS 0.98c) it even looks worse, especially when looking at gkrellm's graph. NORMAL tasks frequently switching from one core to the other without need nor coordination, still leaving the other core for IDLE tasks only, but then each switching to the other core. It's a bug, not a feature.

      BR, Manuel Krause

      Delete
    5. @Manuel
      Will send you two debug patch for testing tonight.

      Delete
    6. @Alfred:
      I've just finished my tests and sent the result to your email account. I hope that you can "read" something from my findings, I absolutely can't.
      Please don't hesitate to suggest more tests/ offer other debug patches.

      BR, Manuel Krause

      Delete
    7. @Alfred:
      (One positive point for github based bug tracking is: One is able to edit postings... :-) )
      I've forgotten to add: Both test kernels are made with the 1000HZ setting to not add more confusion vs. your settings.

      BR, Manuel Krause

      Delete
    8. @Alfred:
      Yeah, again great work!
      "debug_no_renew_next_balance_on_switch.patch" upon pds098c does the trick on my machine. Normal Firefox' tabs reloading (2-threaded) does load both cores almost equally, as well as kernel compilation -j2 does. "Spiking" has gone away. Also kernel compilation time decreased vs. plain pds098c.
      I hope you haven't found drawbacks by your benchmarkings! IMO you can give it to the public for others to test.

      BR and many thanks, Manuel Krause

      Delete
    9. I will plan an update release this weekend. I want PDS code to be stable till next kernel cycle(in 2~3 weeks).
      Now, there is just one task accounting bug(reported by Edurado) in my list and I have put it to https://github.com/cchalpha/linux-gc/issues . I believe it is just accounting problem, and Edurado and I have worked on it.

      Delete
    10. Hi,

      I'm about to compile a kernel for accounting issue, can I have the patch so I'll test accounting and load?

      BR, Eduardo

      Delete
    11. @Eduardo:
      Did you ask for this "debug_no_renew_next_balance_on_switch.patch" here? Is so and Alfred hasn't sent it to you already, should I send it to you or upload it somewhere?

      BR, Manuel Krause

      Delete
    12. @Manuel @Eduardo
      I have pushed PDS098d to bitbucket and github. Will post release note later.

      Delete
    13. Hi,

      I asked just here. Please send it to me, I'll compile it together with accounting debug patch, let's see how that works on my machine.

      BR, Eduardo

      Delete
    14. Hi again,

      Thrn I don't need the patch, I'll grab it from bitbucket.
      Thanks Alfred.

      BR, Eduardo

      Delete
    15. Yes, Alfred was faster than me. Just for reference, the working debug patch is just the last hunk of the newest pds commit before tagging it "VRQ 0.98d".

      BR, Manuel Krause

      Delete
    16. @Alfred:
      ATM I'm observing another "weird" behaviour. It's now the second day with your working debug patch, hibernated over night, and I'm at compiling the kernel for the second time in a row, now with your latest pds commit and 4.13.10. I don't know if my explanation of the symptom and my conclusion would be understandable, but let's try:
      During compilation the compilation tasks begin to stick at ~50% on each core after some time, leaving the rest for the WCG clients. Even after stopping the latter, it takes some time for the compilation to, at once, switch to ~97% again. (Then restarting the WCG clients doesn't disturb compilation.) Is it possible, that it's a degradation over uptime?

      Maybe after all, there's missing a refresh for balancing, but in a _different_ place in code than that one removed by your latest commit (or the debug patch)?

      New kernel is ready and I'll try to reproduce with it, then report back.

      BR, Manuel Krause

      Delete
    17. @Eduardo:
      Would you please be so kind to upload your "accounting debug patch" somewhere? Maybe it's of benefit for me too.

      TIA and BR, Manuel Krause

      Delete
    18. @Manuel
      The debug patch I send for Eduardo is printing debug info to demsg, so normally it helps nothing atm.

      For another "weird" behaviour you reported, I would suggest you observing it longer and compare the behaviour in following scenarios
      #1 after flash reboot
      #2 no background workload
      #3 with the "debug_disable_next_balance.patch" upon 098c

      Delete
    19. @Alfred:
      Sorry that I didn't do enough thorough tests before your "PDS 0.98d release". But who should have known that behaviour?

      #1: Currently fresh booted pds098d, Firefox has all tabs loaded, then video playback in smplayer added, then kernel compilation -j2 added:
      #1 Result: all fine

      I'll repeat this one at least one time when it's finished. Maybe I hibernate to exclude this as a potential source of the erratic behaviour.
      I also thought about again adding the 'sched_balance_interval = MS_TO_NS(1);' approach from your second debug patch "debug_sched_balance_interval_1.patch" upon pds098d.

      With #2 do you mean really no background load -- not even the IDLE WCG clients and _NO_ webbrowser? That would be a hard time if I shall observe this over longer time... ^^

      BR, Manuel Krause

      Delete
    20. Maybe I misunderstood the #2 recommendations. Once my FF's tabs are loaded my system only consumes max. ~4% of cpu, due to many usefully forced ABP rules.

      When only taking out the WCG clients, there would be no reason for your algorithms, to brake out the compilation tasks, or?

      BR, Manuel Krause

      Delete
    21. Can it be that switching to a _different_ task, e.g. on the desktop, wakes up other tasks to balance? BR, Manuel

      Delete
    22. IMO, if compile tasks take 50% of each core and IDLE takes the rest, and after stoping IDLE tasks and compile tasks still take 50%, that means compile tasks doesn't hunger for cpu at that time. It's normal when make trying to search for something need to be compiled in a non-clean make.
      Try the compile tasks when no background workload so you can see the "normal" behavious without other interaction.

      Delete
    23. No, normal compilation immediately should take back both cores ASAP, as it should vs. IDLE tasks or when the latter are killed. To wait any longer for NORMAL is unneeded wasted time.
      ATM, second compilation attempt with pds098d behaved well for a while, then dropping full cpu use, window switching on the desktop, then going to 98% again.

      BR, Manuel Krause

      Delete
    24. BTW. all compilation is done after "make clean" to make it comparable. BR, Manuel

      Delete
    25. Unfortunately I just got a complete lockup within normal use of pds098d, such didn't happen for months. Maybe a kernel compiled at the side at that moment.

      BR, Manuel Krause

      Delete
    26. ATM I'm compiling a 4.13.10 kernel in fashion of #3 and will test for a longer while.

      BR, Manuel Krause

      Delete
    27. @Alfred:
      The #3 kernel does also show the same behaviour: But much later. After a second hibernation with the third kernel compilation. Quite often it drops to 50% equally on both cores. Recovery time of kernel compile tasks to get near 100% got much longer, even if IDLE was killed. ATM it even also sticks to 50% each core.

      So the balancing issue's fix is achieved completely by PDS 0.98d.

      But there must be something else failing, that you haven't touched with your recent work.
      Over time, some tasks/ processes seem to leave/ loose the balancing rules?!

      Hopefully you can read this into code,

      BR, Manuel Krause

      Delete
    28. 4th kernel compilation did never reach more than 50% of each core, but it was equalised.

      BR, Manuel Krause

      Delete
    29. @Manuel
      My suggest is observer it longer and find out in what scenario this behaious will be triggered. For exmample
      #1 no hiberation, compile kernel for 10 times.
      #2 suspend/resume, compile kernel for 10 times.
      #3 hiberate/resume, compile kernel for 10 times.
      And use time to see how much time it takes for each kernel compile.

      Delete
    30. @Alfred:
      Your suggestion is quite a heavy one. I don't think I can do it on my machine that I do need for internet and so on all the time. But I'd do my best, as I want the best PDS to evolve as you do.
      I maybe don't completely follow your recommendations. FF Browser running with very few tabs, but needed. Video playback via smplayer. WCG clients as IDLE in the background to kick your balancing algorithms.

      BR, Manuel Krause

      Delete
    31. BTW, in the first tests over 3 compilations with my former fully tabs loaded FF, FF over time needed more CPU, on each compilation, although nothing changed, cutting the compilation percentage.
      This is all done with pds098d at 1000HZ.

      BR, Manuel Krause

      Delete
    32. @Alfred:
      I'd like to finish my longer-term testing upon task #1 now.
      It didn't show significant problems with pds098d in my more simplified test scenario. I'll send you some kind of ASCII art (text chart) with all info.

      As I never use(d) suspend-to-RAM, I'd like to omit tests for it now.
      But my hibernation is using TOI, not the in-kernel one.

      Please newly advise me for new tests, when you've read my mail!

      TIA and BR, Manuel Krause

      Delete
  2. So far it works nicely, the previous release had some issues especially when I had to unarchive something or on make modules_install of a new kernel.
    This release seems to work without such issues. Seeing the previous post I am usingthe rr int 2, so far everything is fine. Thank you for your hard work!

    ReplyDelete
  3. @all
    Thanks all who support PDS by testing and providing feedback. Your effect indeed helps with PDS development. Recently, I notice that blog is not a suitale place to trace issue, expecially for issues over releases, so I am thinking about using https://github.com/cchalpha/linux-gc/issues to trace PDS related issues, what do you think about this?

    ReplyDelete
    Replies
    1. @Alfred:
      Is it still needed to create an account on github to participate in the issues' discussion? If so, some people may find it unhandy.
      For me, this would not be a problem, already have an account and am familiar to using the issues section of the TOI project. Indeed having this "topic-oriented" issue discussion would be more practical for tracking them over time.

      BR, Manuel Krause

      Delete
    2. This comment has been removed by the author.

      Delete