PDS 0.98c is released with the following changes
1.Refine __sched_setscheduler().
2.Task deadline catch-up algorithm V3, which just apply catch-up algorithm for NORMAL policy tasks.
3.Adjust next_balance value and Fix task balance with low HZ system. (Task policy fairness imbalance issue reported by Manuel still under investigation)
4.Set default yield_type to 0. Help with wine running which use yield APIs. Yield support in PDS will be removed if no complains.
This is a bug fix release, hopefully it helps with compatibility and stability.
Enjoy PDS 0.98c for
v4.13 kernel, :)
code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.13.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.13.y-vrq
All-in-one patch is available too.
@Alfred:
ReplyDeleteI'd like to wait with trying this release until you find a nice solution for the "task policy fairness imbalance issue". TIA for this. So far, I'm quite content with the mentioned revert setup from last blog entry. I hope my explanations were clearly enough.
Please tell me, if you need other info/ debugging or like me to try preliminary test patches or such.
BTW, since the discussion of sched_yield in last blog entry I've also switched to =0 and don't see negative effects. But maybe you can keep the interface in the code for a while, especially if this interface itself doesn't do any harm. IIRC, the older BFS code from Con had followed the =2 approach.
BR, and many thanks for your work,
Manuel Krause
@Manuel
DeleteFor your original setting, 250HZ and 6ms default rr_interval, the sched_balance_interval is set to 0, so it would be the same as revert the next_balance related changes.
For 1000HZ and 6ms rr_interval setting, sched_balance_interval is set to 2/3 of rr_interval, that's 4ms.
My testing result of above setting both come out as expected. So I encourage you to have a try for this release.
@Alfred:
DeleteI had been running the previous kernels at exactly 512HZ by a hand made patch, originally inspired by Con Kolivas' writings of possible micro-optimisations in the code (that Con discarded relatively quickly).
Now, that you encourage me to test the new PDS patch, I'd do that. But I'm currently at 1000HZ and go o.k with it. Should I really go back to 512HZ to verify the usefulness of your "pds: Fix task balance with low HZ system." commit? If I understand your code correctly, it's an exclusive one for cases that do or don't match your targeted HZ.
BR, Manuel Krause
@Alfred:
DeleteNo advantage with PDS 0.98c for me.
Kernel compilation -j2 sticks at cpu1, IDLE and some very few other NORMALs at cpu0.
And this is with 1000HZ. I go back to the previous kernel, with reverted patches setup, until it gets equalized.
BR, Manuel Krause
And in the second compilation attempt (still at PDS 0.98c) it even looks worse, especially when looking at gkrellm's graph. NORMAL tasks frequently switching from one core to the other without need nor coordination, still leaving the other core for IDLE tasks only, but then each switching to the other core. It's a bug, not a feature.
DeleteBR, Manuel Krause
@Manuel
DeleteWill send you two debug patch for testing tonight.
@Alfred:
DeleteI've just finished my tests and sent the result to your email account. I hope that you can "read" something from my findings, I absolutely can't.
Please don't hesitate to suggest more tests/ offer other debug patches.
BR, Manuel Krause
@Alfred:
Delete(One positive point for github based bug tracking is: One is able to edit postings... :-) )
I've forgotten to add: Both test kernels are made with the 1000HZ setting to not add more confusion vs. your settings.
BR, Manuel Krause
@Alfred:
DeleteYeah, again great work!
"debug_no_renew_next_balance_on_switch.patch" upon pds098c does the trick on my machine. Normal Firefox' tabs reloading (2-threaded) does load both cores almost equally, as well as kernel compilation -j2 does. "Spiking" has gone away. Also kernel compilation time decreased vs. plain pds098c.
I hope you haven't found drawbacks by your benchmarkings! IMO you can give it to the public for others to test.
BR and many thanks, Manuel Krause
I will plan an update release this weekend. I want PDS code to be stable till next kernel cycle(in 2~3 weeks).
DeleteNow, there is just one task accounting bug(reported by Edurado) in my list and I have put it to https://github.com/cchalpha/linux-gc/issues . I believe it is just accounting problem, and Edurado and I have worked on it.
Hi,
DeleteI'm about to compile a kernel for accounting issue, can I have the patch so I'll test accounting and load?
BR, Eduardo
@Eduardo:
DeleteDid you ask for this "debug_no_renew_next_balance_on_switch.patch" here? Is so and Alfred hasn't sent it to you already, should I send it to you or upload it somewhere?
BR, Manuel Krause
@Manuel @Eduardo
DeleteI have pushed PDS098d to bitbucket and github. Will post release note later.
Hi,
DeleteI asked just here. Please send it to me, I'll compile it together with accounting debug patch, let's see how that works on my machine.
BR, Eduardo
Hi again,
DeleteThrn I don't need the patch, I'll grab it from bitbucket.
Thanks Alfred.
BR, Eduardo
Yes, Alfred was faster than me. Just for reference, the working debug patch is just the last hunk of the newest pds commit before tagging it "VRQ 0.98d".
DeleteBR, Manuel Krause
@Alfred:
DeleteATM I'm observing another "weird" behaviour. It's now the second day with your working debug patch, hibernated over night, and I'm at compiling the kernel for the second time in a row, now with your latest pds commit and 4.13.10. I don't know if my explanation of the symptom and my conclusion would be understandable, but let's try:
During compilation the compilation tasks begin to stick at ~50% on each core after some time, leaving the rest for the WCG clients. Even after stopping the latter, it takes some time for the compilation to, at once, switch to ~97% again. (Then restarting the WCG clients doesn't disturb compilation.) Is it possible, that it's a degradation over uptime?
Maybe after all, there's missing a refresh for balancing, but in a _different_ place in code than that one removed by your latest commit (or the debug patch)?
New kernel is ready and I'll try to reproduce with it, then report back.
BR, Manuel Krause
@Eduardo:
DeleteWould you please be so kind to upload your "accounting debug patch" somewhere? Maybe it's of benefit for me too.
TIA and BR, Manuel Krause
@Manuel
DeleteThe debug patch I send for Eduardo is printing debug info to demsg, so normally it helps nothing atm.
For another "weird" behaviour you reported, I would suggest you observing it longer and compare the behaviour in following scenarios
#1 after flash reboot
#2 no background workload
#3 with the "debug_disable_next_balance.patch" upon 098c
@Alfred:
DeleteSorry that I didn't do enough thorough tests before your "PDS 0.98d release". But who should have known that behaviour?
#1: Currently fresh booted pds098d, Firefox has all tabs loaded, then video playback in smplayer added, then kernel compilation -j2 added:
#1 Result: all fine
I'll repeat this one at least one time when it's finished. Maybe I hibernate to exclude this as a potential source of the erratic behaviour.
I also thought about again adding the 'sched_balance_interval = MS_TO_NS(1);' approach from your second debug patch "debug_sched_balance_interval_1.patch" upon pds098d.
With #2 do you mean really no background load -- not even the IDLE WCG clients and _NO_ webbrowser? That would be a hard time if I shall observe this over longer time... ^^
BR, Manuel Krause
Maybe I misunderstood the #2 recommendations. Once my FF's tabs are loaded my system only consumes max. ~4% of cpu, due to many usefully forced ABP rules.
DeleteWhen only taking out the WCG clients, there would be no reason for your algorithms, to brake out the compilation tasks, or?
BR, Manuel Krause
Can it be that switching to a _different_ task, e.g. on the desktop, wakes up other tasks to balance? BR, Manuel
DeleteIMO, if compile tasks take 50% of each core and IDLE takes the rest, and after stoping IDLE tasks and compile tasks still take 50%, that means compile tasks doesn't hunger for cpu at that time. It's normal when make trying to search for something need to be compiled in a non-clean make.
DeleteTry the compile tasks when no background workload so you can see the "normal" behavious without other interaction.
No, normal compilation immediately should take back both cores ASAP, as it should vs. IDLE tasks or when the latter are killed. To wait any longer for NORMAL is unneeded wasted time.
DeleteATM, second compilation attempt with pds098d behaved well for a while, then dropping full cpu use, window switching on the desktop, then going to 98% again.
BR, Manuel Krause
BTW. all compilation is done after "make clean" to make it comparable. BR, Manuel
DeleteUnfortunately I just got a complete lockup within normal use of pds098d, such didn't happen for months. Maybe a kernel compiled at the side at that moment.
DeleteBR, Manuel Krause
ATM I'm compiling a 4.13.10 kernel in fashion of #3 and will test for a longer while.
DeleteBR, Manuel Krause
@Alfred:
DeleteThe #3 kernel does also show the same behaviour: But much later. After a second hibernation with the third kernel compilation. Quite often it drops to 50% equally on both cores. Recovery time of kernel compile tasks to get near 100% got much longer, even if IDLE was killed. ATM it even also sticks to 50% each core.
So the balancing issue's fix is achieved completely by PDS 0.98d.
But there must be something else failing, that you haven't touched with your recent work.
Over time, some tasks/ processes seem to leave/ loose the balancing rules?!
Hopefully you can read this into code,
BR, Manuel Krause
4th kernel compilation did never reach more than 50% of each core, but it was equalised.
DeleteBR, Manuel Krause
@Manuel
DeleteMy suggest is observer it longer and find out in what scenario this behaious will be triggered. For exmample
#1 no hiberation, compile kernel for 10 times.
#2 suspend/resume, compile kernel for 10 times.
#3 hiberate/resume, compile kernel for 10 times.
And use time to see how much time it takes for each kernel compile.
@Alfred:
DeleteYour suggestion is quite a heavy one. I don't think I can do it on my machine that I do need for internet and so on all the time. But I'd do my best, as I want the best PDS to evolve as you do.
I maybe don't completely follow your recommendations. FF Browser running with very few tabs, but needed. Video playback via smplayer. WCG clients as IDLE in the background to kick your balancing algorithms.
BR, Manuel Krause
BTW, in the first tests over 3 compilations with my former fully tabs loaded FF, FF over time needed more CPU, on each compilation, although nothing changed, cutting the compilation percentage.
DeleteThis is all done with pds098d at 1000HZ.
BR, Manuel Krause
@Alfred:
DeleteI'd like to finish my longer-term testing upon task #1 now.
It didn't show significant problems with pds098d in my more simplified test scenario. I'll send you some kind of ASCII art (text chart) with all info.
As I never use(d) suspend-to-RAM, I'd like to omit tests for it now.
But my hibernation is using TOI, not the in-kernel one.
Please newly advise me for new tests, when you've read my mail!
TIA and BR, Manuel Krause
So far it works nicely, the previous release had some issues especially when I had to unarchive something or on make modules_install of a new kernel.
ReplyDeleteThis release seems to work without such issues. Seeing the previous post I am usingthe rr int 2, so far everything is fine. Thank you for your hard work!
Compiled/booted OK, thanks.
ReplyDelete@all
ReplyDeleteThanks all who support PDS by testing and providing feedback. Your effect indeed helps with PDS development. Recently, I notice that blog is not a suitale place to trace issue, expecially for issues over releases, so I am thinking about using https://github.com/cchalpha/linux-gc/issues to trace PDS related issues, what do you think about this?
@Alfred:
DeleteIs it still needed to create an account on github to participate in the issues' discussion? If so, some people may find it unhandy.
For me, this would not be a problem, already have an account and am familiar to using the issues section of the TOI project. Indeed having this "topic-oriented" issue discussion would be more practical for tracking them over time.
BR, Manuel Krause
+1
DeleteThis comment has been removed by the author.
Deletevoting for github
DeleteSame here
Delete