PDS 0.99m is released with the following changes
1. [Sync] f29a8be0e5d2 cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
This is a sync-up release for 4.20.8+, and should be the last PDS for 4.20.
Enjoy PDS 0.99m for v4.20 kernel, :)
Code are available at https://gitlab.com/alfredchen/linux-pds
All-in-one patch is available too.
Strange problems with PDS and Akonadi/kmail
ReplyDeleteHi Alfred,
have finally found the cause for my akonadi/kmail problem. It's PDS. I never ever think that this could be possible. I blamed all the time akonadi/kde/kmail.
Error: akonadi/kmail hangs in synchronizing some of my imap accounts. It stuck at 0% for some imap accounts, other do run fine. If I restart akonadi, than other or sometimes the same accounts do hang at 0% and don't synchronize at all. Imho, I was believing, that's akoandi fault and some kde update was the cause (to be fair, akonadi and kmail was really buggy some years ago).
No such error with the standard arch kernel (or the zen kernel). No such error, if I only change the scheduler for my kernel .config and go with CFS. Still the same error, if I go for (full/"half") tickless or not with PDS.
So I couldn't exactly date the first problem with PDS, because I never ever think of it in conjunction with this problem. But if I remember correct, the first time the error occurs was mid December last year.
So any suggestion how to find a solution (and no, I don't wan't to go with thunderbird ;) )?
Regards sysitos
Tested 4.20.10 kernel now with MuQSS, no problem with akonadi and kmail. Tried an older PDS version, but there are compiling errors with newest 4.19.23. Maybe I find a solution for this compiling probelm.
DeleteRegards sysitos
Interesting problem Sysitos.
DeleteCould you perhaps do a diff -u between the two configs?
diff -u 4.20.10_PDS_Not_Working/.config 4.20.10_MuQSS_Working_Fine/.config
And post the difference. Just in case there is something else here than the scheduler, and some auto setting that fubars this. (I have had problems with -ck/MuQSS and CONFIG_FORCE_IRQ_THREADING where i get timeouts/errors when doing a "make -j12" compile)
Hi Sveinar,
Deleteto be exact, I haven't used plain PDS nor plain MuQSS. For PDS I use the pf-Kernel, for MuQSS I used your ck-patchset (without the 000_patch-4.20.10.patch which leads after copying my .config to a compiling error). Maybe I should test PDS in plain form to exclude other patches from being the cause.
So the diff here only in shortened form (excluded mismatches like "# ... is not set"):
-CONFIG_SCHED_PDS=y
+CONFIG_SCHED_MUQSS=y
+CONFIG_RQ_MC=y
+CONFIG_SHARERQ=2
+CONFIG_MQ_IOSCHED_BFQ=y
-CONFIG_UKSM=y
Regards sysitos
Ok,
Deletehere the confirmation. Plain linux 4.20.10 only with Alfred's all-in-one patch and PDS activated leads to the hung in akonadi/kmail. At the moment 4 from 7 of my imap accounts do stuck at 0% in the synchronizing queue.
Regards sysitos
Another test,
DeleteI checked out pf-kernel branch pf-18, so I get kernel 4.18.0-pf11 with PDS-mq CPU Scheduler 0.99a by Alfred Chen and there are already the errors with akonaid/kmail. At the moment there are 3of7 imap synchronizing task, which do hang at 0%.
Some additional info: Had some time ago only 2-3 imap accounts within kmail/akonadi, and the synchronizing hung was not identified by me as a permanent problem in conjunction with PDS, but rather than a temporary problem with something else.
Regards sysitos
Ok,
Deletehere the finally results after some more test:
runs fine: Linux 4.17.0-pf8 with PDS-mq CPU Scheduler 0.98t by Alfred Chen
runs fine: Linux 4.18.0-pf6 with PDS-mq CPU Scheduler 0.98y by Alfred Chen
error starts with: Linux 4.18.0-pf7 with PDS-mq CPU Scheduler 0.99a by Alfred Chen
So Alfred, time for a new bug hunt ;)
Regards sysitos
@sysitos:
DeleteReally many thanks to you for elaborating and thoroughly testing your issue!
I assume that other processes/ programs may suffer from the same underlying error, so I hope that your investigations and Alfred's code make PDS better. Hopefully, before next major kernel release... :-)
Best regards,
Manuel
Nothing pops out from that diff, but the "# ... is not set" could be important enough.
DeleteThe -ck (or my patchset) is intended for use with the plain kernel. The 0000-4.20.10.patch IS the 4.20 -> 4.20.10 patch, so without that, you are running plain 4.20.0 kernel.
Not really sure what is the most difference between 0.98 -> 0.99, other than some "SMT_NICE" changes, and possibly something making "SCHED_ISO" not work too well. Cant remember on the top of my head, but i think it was around that 4.18'ish time i no longer could use SCHED_ISO with PDS when playing under wine. I can with MuQSS, so if you run some scheduler-priority-addon thingy, ala "Gamemode daemon" or similar this could perhaps explain something?
I would say that scheduler has very little to do with this particular hang.
DeleteFrom my expierence I would say that it's a race condition in that code, which is exposed by more interactive scheduler. CFS and MUQSS behaves differently than PDS and PDS just happen to reliably trigger the condition.
I might be wrong :)
BR, Eduardo
I don't know if it's worth this question... @sysitos and all others:
DeleteDo you use the "threadirqs"? Either at kernel command line (like me) or with MuQSS/ck as compiled-in setting? Just wondering if that setting makes a difference.
TIA,
Manuel
@Sveinar,
DeleteI downloaded the 4.20.10 kernel source archive and than imported with quilt your patches, and so I think I running 4.20.10 ;)
The 0.98 to 0.99 was a big internal code change.
Regards sysitos
@sysitos
ReplyDeleteThanks for the testing and reporting. I would turn back and look what the code changes between 098y and 099a. But before I looking into the detail code changes, would you please try different yield type and see if this helps? That's the only thing I can think of some special may hang(unable to continue run on) if they rely on sched_yield().
And sorry for the late reply here, was busy with new scheduler project(runs well so far on my nuc machine) and the 5.0 sync-up works.
@Alfred, good news on new scheduler, any chance we'll see it soon?
DeleteBR, Eduardo
@Eduardo
DeleteI just make it run upon 5.0-rc kernel code today. The only benchmark data I got so far, is the vm kernel image boot up time. It is promising(comparing vs PDS). Some low priority feature still need to be done and massive benchmark will be kick-off in 5.0 kernel release.
If all goes well, it will be released sometime in the 5.0 kernel release.
@Alfred, great news, I can offer to help with compilation, gaming and interactivity testing if You feel like You can send me the patch via e-mail, otherwise I'll be patiently waiting for the new thing :)
DeleteBR, Eduardo
Hi Alfred
Deletetested on Linux 4.18.0-pf7 with PDS-mq CPU Scheduler 0.99a all the yield values, always with the described error. Than changed the rr_interval to 1 and it seems to work, tested it with all yields (0,1,2) too. Checked after it with rr_interval from 5..2, to verify the results. Seems that 2 was working too. In the hope, I had found the solution, I compiled the newest pf-kernel (and overwrote my old kernel with 099a ;/ ), changed the kernel command line with rr_interval=1, but no success at all. With the new kernel, there is no difference, if I change rr- and/or yield-values, always the error :(
Btw. is there a kernel parameter for yield too or only over /proc/sys/kernel/yield_type?
So I think, I must go the other long way with git bisect too ;)
Regards sysitos
@sysitos:
DeleteDon't give up, according to @Alfred's posting from February 19, 2019 at 6:43 PM (as seen below) it's only about 6 true commits. If you let each run overnight, then we know about in one week. ;-)
Thanks for your time,
BR, Manuel
Not sure whats the difference between PDS/CFS and the kernel option CONFIG_IRQ_FORCED_THREADING, and the added MuQSS option CONFIG_FORCE_IRQ_THREADING.. or if they are even supposed to be used together?
DeleteI do know that i run with CONFIG_IRQ_FORCED_THREADING=y on both schedulers, but if i enable CONFIG_FORCE_IRQ_THREADING=y with MuQSS, it soon goes tits up with everything that loads the kernel at all. (compiling or whatnot).
Great news.
DeleteWill it still be called PDS?
There will be no deadline concept in the new scheduler, so PDS is no more the proper name for it, :)
DeleteOK.
DeleteDo you have any name yet?
Thanks.
@sysitos
ReplyDeleteGood news is there only 6 actual commits between 098y to 099a. I don't want to suspect which one is the cause at this time, but the best way is use "git bisect" on pds 4.18 branch(https://gitlab.com/alfredchen/linux-pds/tree/linux-4.18.y-pds) on your site to find it out.
e9140c7fe85f (origin/linux-4.18.y-pds, linux-4.18.y-pds) Tag PDS 0.99a
bac03ec0a243 pds: Fix task burst fairness issue.
4f431cdc66a8 pds: Fix sugov_kthread_create fail to set policy.
7bc1a2c56bee Tag PDS 0.98z
0f294a28dec1 pds: Re-mapping SCHED_DEADLINE to SCHED_FIFO
4f4af01d7ac7 pds: Improve idle task SMT_NICE handling in ttwu.
5a8fbadc4e03 pds: Don't balance on an idle task.
76919a998aa1 pds: Replace task_queued() by task_on_rq_queued().
Hi Alfred,
ReplyDeleteso some trillions of tests later ;)
Here are the results:
Short:
error starts with: "Fix task burst fairness issue."
Long:
bac03ec0a243 (HEAD, refs/bisect/bad) pds: Fix task burst fairness issue.
4f431cdc66a8 (refs/bisect/good-4f431cdc66a8700629b607d1eec381e85130b2e1) pds: Fix sugov_kthread_create fail to set policy.
7bc1a2c56bee Tag PDS 0.98z
0f294a28dec1 (refs/bisect/good-0f294a28dec1adbcf3c3085204fa922ac2166b9b) pds: Re-mapping SCHED_DEADLINE to SCHED_FIFO
4f4af01d7ac7 pds: Improve idle task SMT_NICE handling in ttwu.
5a8fbadc4e03 pds: Don't balance on an idle task.
76919a998aa1 pds: Replace task_queued() by task_on_rq_queued().
0fb4e82d39ad (refs/bisect/good-0fb4e82d39ad1048aae987acc26d5039eb317dc3) Tag PDS 0.98y
And now some surprise:
I checked with commit bac03ec0a243 some rr_interval values.
Result: rr_interval from 1 to 3 seems to run fine (as already seen on last tests), with rr_interval=4 the error with the hung imap sync queue appears from time to time.
Hope that I could help you.
Seems, that the error goes stronger with newer versions, because with actual version the rr_interval doesn't help anymore.
Regards sysitos
@sysitos
DeleteThanks for the bisect tests. Lower rr_interval just make the issue less likely to be triggered, so it should not be a solution.
One more thing, does CONFIG_SCHED_HRTICK=y in your kernel config file?
Hi Alfred,
DeleteCONFIG_SCHED_HRTICK=y is there.
Should I check your new concept scheduler, because the repairing and fixing of this old bug in the soon deprecated PDS isn't worth it?
Regards sysitos
@sysitos
DeleteWould you pls send me an email? I'd like to prepare a patch for your debugging.
The new scheduler is based on PDS code base, so most likely it will has the same issue.
Hi Alfred,
Deleteshort: your patch helped a lot, but the problem still persist.
long: Applied your patch on top of pf-kernel and than additionally on top of your pds git tree for 4.20 to exclude some troubles. Results are the same. It's a way better than without the patch, now most time only 1 (or 0) hung imap sync process, prior it was 2-3. But now there is a influence of yield, what imho you had already in mind. I triggered the error until now only with yield_type=1. 0 and 2 run fine, without the error (with no influence of rr_interval at all, tested different values here). Btw. is the new default rr_interval=4? Wasn't it 6 some time ago?
Regards sysitos
@sysitos
DeleteSorry for the late reply during the weekend. Based on your testing, I believe issue is caused by sys_yield() usage of user land code and a bug in PDS code together, and https://gitlab.com/alfredchen/linux-pds/commit/2fab3ad028e396a9b0de760425052a2ab1444936 is the proper code fix in PDS. And adjust yield type would be the workaround for the user who use affected applications.
For rr_interval, it has been change to 4ms for some times, and it's not encouraged to change this value.
Hi Alfred,
Deletecouldn't agree with you in this case. Yes, there are times, when the schedulers triggers errors produced within other applications. But it seems not be the case here. That's why I tested your patch not only on the newest 4.20 git, but also on older ones and here are the results:
The error couldn't be triggered by me with kernel 4.18 and PDS 0.99a (or better on your last commit for this branch linux-4.18.y-pds). All runs fine. This is the case too for branch linux-4.19.y-pds and PDS 0.99b, commit 770c3b622528. No problems at all. But there are problems with your last commit on this branch, the error triggers instantly. Other commits are not tested yet by me.
Btw, no error with cfs and muqss with all yields.
Regards sysitos
Hi Alfred,
Deleteso I bisect the wholw 4.19.y-pds tree (and patched always your fix) and here are my (shortened) results:
51d8f8b86d81 (HEAD, refs/bisect/bad) pds: Rework time_slice_expired()
a473f87a3bd1 (refs/bisect/good-a473f87a3bd13ca95b3838108aa8f3a2f7e0f8e6) pds: Fix cpu hot-plug Oops.
55fdf19c03c1 (refs/bisect/good-55fdf19c03c121144717c95e9b0b177cf1cb883b) pds: [Sync] c377a2a8bf25 (refs/bisect/good-c377a2a8bf25e30707083156befda486b0e202b8) pds: Remove cpumask_weight() in best_mask_cpu().
770c3b622528 (refs/bisect/good-770c3b6225288fb308631c3a1ede419bbe2d735a) Tag PDS 0.99b
So I hope, that I could help.
Regards sysitos
@sysitos
DeleteThanks for these further testing. Let me explain it this way, sched_yield() is an "evil" system call, which give up current task run time to let other tasks in the system to run and get the job making progress. In modern days, there are many ways to do IPC so current task can wait on something till other tasks can have cpu time, get the job done and notify it. But,it is legacy and it is still be used.
It's "evil" b/c it is not reliable, it depends on the scheduler how to handle the yielded task and let other tasks to be run. CFS use skip flag in task structure and BFS/muqss/VRQ/PDS use yeild types, all are different in implementation, but none is guaranteed(IMO). So, application using sched_yield() may work different in behaviors under different scheduler/yield type.
Back to the PDS, I have checked the 51d8f8b86d81 (HEAD, refs/bisect/bad) pds: Rework time_slice_expired(), the code change is correct and as expected. But it failed some yield type for your application sched_yield() usage. It still sounds good to me as there is other yield type can workaround it
Maybe we should introduce a more reliable way to handle yield in scheduler, but I believe it's too late for PDS and thinking about it in the new incoming scheduler, it will still be a low priority item. Be honest, if I could control user-land usage, I'd eliminate sched_yield() system call, :)
@Alfred:
DeleteIf one would use a non-default sched_yield value as workaround, like sysitos, which one would you suggest/ recommend?
Do they have different impacts, that you know of?
Best regards,
Manuel
I have escaped weird behavior by setting the value to 2, mostly related to intel graphics driver, if I remember correctly.
DeleteBR, Eduardo
@Eduardo:
DeleteThank you for adding this info !
Several years ago I've used a nvidia gfx card, where sched_yield = 2 was the only way to operate it properly over longer time. But code changed that much inbetween, that I won't pinpoint any former scheduler/ driver.
I assume, that you use the normal kernel & X11 drivers for your Intel gfx ATM., right?
BR, Manuel
Hi Alfred,
Deletemany thanks for the detailed explanation, was time for me to google for it (I'm not a programmer).
But what does that mean, if all yield_types lead to an error? With the new pf-kernel (your bug fix already included), the error triggers now with yield_type=0 too. So I only have yield_type=2 as a workaround, must only check, how to set it during boot up or in the source code. But if your new concept scheduler does work different, maybe the error doesn't trigger there, so don't invest to much time in it. We have (hopefully) Linux 5.0 next week ;)
Regards sysitos
@sysitos:
DeleteI always call a script to change openSUSE's defaults, once my desktop is up. I know, that's way too old-fashioned. But it leaves me in charge.
Maybe you can place an appropriate script into the systemd folders and let it been called during bootup?
Unfortunately, I'm too unexperienced with this.
BR, Manuel
@Manuel,
DeleteI use PDS exclusively on all machines I own (and not).
I have placed some tweaks in /etc/rc.local to be executed every time computer starts, that includes yield_type as well.
BR, Eduardo
@Manuel and Eduardo,
Deletethanks, had in mind some udev rule. Endless ways in linux to do so ;). But wouldn't help, see below.
@Alfred
bad news, the error was triggered now even with last yield_type=2 settings. So no workaround possible anymore. Checked it with newest pf-kernel.
Regards sysitos
Hi Alfred,
Deletehere I am again. I know, I'm a little bit insistent ;)
Because there is no workaround for my problem and I don't want to go back to cfs, but on the other side I need my mail too, I have checked the problem again and found 2 solutions:
1. solution (the ugly one):
I completely reverted your Commit 51d8f8b8 on top of the actual pf-kernel. Does compile fine and even better, it does work without the mentioned errors. But I think, you wouldn't like it, because of your rework within this commit.
2. solution, the elegant one, so I hope ;)
I checked again your commit and than changed only a single sign:
line 463 old: if (p->prio >= NORMAL_PRIO) {
line 463 new: if (p->prio > NORMAL_PRIO) {
Compiled and run fine. Even tested with different yields (0,1,2). Couldn't trigger the error yet, checked different situations. Haven't seen any drawbacks.
Maybe you or someone else could double check it.
Thanks for your help.
Regards sysitos
@sysitos
DeleteI kind of expect that commit to disable timeslice expiration for "normal tasks" by doing that. Not that i am a programmer or expert in any way tho :)
Ie. You would never "update_task_priodl(p);" if a task is running as "Normal prio" (most tasks are).
This would in turn probably work for you, but i am not entirely sure it is elegant for the rest of us? :)
@sveinar
Deletethanks for clarification, than the only clean solution (for me) is solution 1, the ugly one :/
Alfred was handling the normal prio tasks in the old commit in an other way, which had no drawback here.
But sorry for the stupid question, maybe you could clarify it a little bit, does that mean, that a "normal prio task" wouldn't never be refreshed and the assigned process time within a tick and only for this tick would be the same?
So do you have a working load example, where I could check the wrong behavior?
Thanks and regards
Sysitos
Hi (@svainar),
Deleteso changed back my stupidness and modified pds.c (in the mind of the old commit):
if (p->prio >= NORMAL_PRIO) {
if (p->prio == NORMAL_PRIO) {
p->deadline /= 2;
p->deadline += (rq->clock + task_deadline_diff(p)) / 2;
} else
p->deadline = rq->clock + task_deadline_diff(p);
update_task_priodl(p);
}
This works here, and should be elegant for the rest too ;).
PS: Could all be useless, because Alfred is working on a new scheduler ;)
Regards sysitos
@sysitos
DeleteI have to said that none of your change met the design intention. 1), revert the commit is not a good idea as there is bug in previous deadline calculation, that's why rework the time slice expiration. 2), it will by pass deadline update for NORMAL tasks at all.
Your last code change looks ok, I'd suggest you to change the deadline calculation to
p->deadline /= 2;
p->deadline += rq->clock / 2 + task_deadline_diff(p);
If it works for you and your issue,you can keep the code change for yourself. I am not going to make changes to PDS so far, b/c I believe this only fix particular cases, may fail other cases. I hope you can understand it.
For long term, there will be no deadline concept in new scheduler, so less trouble to worry about. But there will be still yield problem. Will see how to handle it later, as it will be low priority item.
Hi Alfred,
Deletethx for the code change. Looks better and runs fine (As I wrote, I used the formula from your old commit as quick hack.)
But not reusing the old p->deadline and ignoring it and recalculating it from scratch leads for me to mentioned bug. Had you asked me 2 weeks ago, than I had say'd, that there is no problem for me with PDS, because this hung is really difficult to identify. But this hung leads already to a bug-fix, no one else had mentioned.
But anyway, thanks for your help. It's ok for me, to do after an "git pull" an "quilt import" too ;). Don't worry about it. I know now the cause and the solution.
But maybe you could me please inform, will the new p->deadline will always goes bigger than the old one or will it depend on situation (load etc.)?
Many Thanks and Regards
sysitos
Anyway to implement this in PDS or the new scheduler?
ReplyDeletehttps://github.com/clearlinux-pkgs/linux/blob/master/0123-add-scheduler-turbo3-patch.patch
I have done pre-study about itmt last year on my intel gen8 cpu on a notebook, but it turned out that it doesn't support itmt. Will check it back on the new scheduler when principal feature are done this year.
DeleteWhat does this clearlinux patch actually do?
DeleteIt prefers higher clocking cores for tasks.
Deletehttps://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-max-technology.html
betpark
ReplyDeletetipobet
betmatik
mobil ödeme bahis
poker siteleri
kralbet
slot siteleri
kibris bahis siteleri
bonus veren siteler
REW
betmatik
ReplyDeletekralbet
betpark
tipobet
slot siteleri
kibris bahis siteleri
poker siteleri
bonus veren siteler
mobil ödeme bahis
15H7YC