Saturday, September 30, 2017

PDS 0.98a release

PDS 0.98a is released with the following changes

1. Fix calculation mistake in task_deadline_level() in previous release.
2. Reduce policy fairness balance overhead when task_deadline_level() calcalation is corrected.
3. Refine policy fairness balance.
4. For 32bits kernel, remove a global lock accessing by only preempt lower scheduling level run queue. (32bits Raspberry PI should get some love)
5. Extend one more NORMAL policy deadline level.
6. Fix reverted task policy value.

This is a bug fix release plus some enhancement. Compare to previous release, there is some performance regression in exchange for some interactivity improvment. Now, all design should work as expected.

Enjoy PDS 0.98a for v4.13 kernel, :)

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.13.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.13.y-vrq

 All-in-one patch is available too.

61 comments:

  1. Wonderful work Alfred! Tested for a week, everything ok.
    Very smooth and responsive.
    Merged into XanMod 4.13 branch!
    https://github.com/xanmod/linux/commits/4.13

    ReplyDelete
  2. Thanks Alfred!

    Here are some throughput benchmarks of PDS0.98a :
    https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

    Warning! bz2, xz, lame and x264 have been rebuilt since the last round of tests, and gcc has been updated.

    PDS throughput is almost always better than CFS.

    Pedro

    ReplyDelete
  3. Compiled and installed on i7 @ work, Ryzen @ home, so far (~ 24 hrs) it's stable.
    I haven't noticed any measurable performance improvement in games, yet. Compilation takes +/- the same time.

    Thanks Alfred,
    Eduardo & Rinaldo

    ReplyDelete
  4. Hi Alfred, one user reported issues running wine and under heavy load multitasking:

    CybDex:

    Had some issues with kernel panic and stuff trying to run wine with the new 4.13.4-xanmod7 kernel.

    Maybe related to the new PDS-mq: Priority and Deadline based Skiplist Multiple Queue CPU Process Scheduler.?

    Also did a compile using make -j8 and all cores peaked at 100% with the system more or less unusable during the compile. I know the -j8 makes use of all cores (including HT ones) on my I7, but the system is usually responsive enough to browse and do regular stuff, but with the xanmod7 it was almost a dead stop. Could it be some setting i need to tune to get the new PDS-mq up and running?

    https://forum.xanmod.org/thread-2-post-2419.html#pid2419

    ReplyDelete
    Replies
    1. dmesg (wine bug):

      [ 59.626065] ------------[ cut here ]------------
      [ 59.626066] kernel BUG at mm/usercopy.c:72!
      [ 59.626068] invalid opcode: 0000 [#1] PREEMPT SMP
      [ 59.626069] Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 nvidia_uvm(POE) xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c nvidia_drm(POE) iptable_filter nvidia_modeset(POE) ip_tables intel_rapl x86_pkg_temp_thermal snd_hda_intel x_tables nvidia(POE) intel_powerclamp kvm_intel snd_hda_codec snd_usb_audio kvm snd_hda_core snd_usbmidi_lib snd_hwdep drm_kms_helper snd_pcm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_seq_midi pcbc
      [ 59.626090] snd_seq_midi_event parport_pc eeepc_wmi uvcvideo bnep rfcomm asus_wmi drm snd_rawmidi ppdev videobuf2_vmalloc mxm_wmi wmi_bmof aesni_intel sparse_keymap snd_seq videobuf2_memops bluetooth videobuf2_v4l2 aes_x86_64 snd_seq_device crypto_simd nct6775 videobuf2_core glue_helper ecdh_generic snd_timer cryptd fb_sys_fops videodev hwmon_vid syscopyarea intel_cstate snd joydev media input_leds intel_rapl_perf coretemp sysfillrect sysimgblt mei_me soundcore lpc_ich mei serio_raw lp shpchp parport wmi mac_hid binfmt_misc nls_iso8859_1 hid_generic usbhid hid e1000e psmouse ahci libahci ptp pps_core video
      [ 59.626113] CPU: 1 PID: 3669 Comm: wineserver Tainted: P OE 4.13.4-xanmod7 #1
      [ 59.626113] Hardware name: System manufacturer System Product Name/P8P67 EVO, BIOS 3602 11/01/2012
      [ 59.626114] task: ffff9667c54ae540 task.stack: ffffad95c8bc0000
      [ 59.626118] RIP: 0010:__check_object_size+0x123/0x1c0
      [ 59.626119] RSP: 0018:ffffad95c8bc3ee0 EFLAGS: 00010282
      [ 59.626120] RAX: 000000000000005e RBX: 0000000000000080 RCX: 0000000000000001
      [ 59.626121] RDX: 0000000000000000 RSI: ffffffff9a75cefa RDI: 00000000ffffffff
      [ 59.626122] RBP: ffffad95c8bc3f00 R08: 0000000000000000 R09: 00000000000003bc
      [ 59.626122] R10: 0000000000000008 R11: ffffffff9cdad70d R12: 0000000000000000
      [ 59.626123] R13: ffff966844e48ac0 R14: ffff966844e48a40 R15: 0000000000000e57
      [ 59.626124] FS: 00007f2c00ec3740(0000) GS:ffff96685ec40000(0000) knlGS:0000000000000000
      [ 59.626125] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 59.626126] CR2: 00000000f7d15408 CR3: 00000003b6fb0000 CR4: 00000000000406e0
      [ 59.626127] Call Trace:
      [ 59.626131] SyS_sched_setaffinity+0x64/0x100
      [ 59.626133] entry_SYSCALL_64_fastpath+0x1e/0xa9
      [ 59.626135] RIP: 0033:0x7f2c0119935f
      [ 59.626135] RSP: 002b:00007fff02a3e350 EFLAGS: 00000246 ORIG_RAX: 00000000000000cb
      [ 59.626137] RAX: ffffffffffffffda RBX: 00007f2c0148c760 RCX: 00007f2c0119935f
      [ 59.626137] RDX: 00007fff02a3e420 RSI: 0000000000000080 RDI: 0000000000000e57
      [ 59.626138] RBP: 0000000000000211 R08: 0000000000000001 R09: 0000000000000e57
      [ 59.626139] R10: 00007fff02a3e1e0 R11: 0000000000000246 R12: 0000000000017e51
      [ 59.626140] R13: 0000000000000021 R14: 00007f2c0148c7b8 R15: 0000000001073fa0
      [ 59.626141] Code: 48 0f 45 d1 48 c7 c6 67 fd 6b 9a 48 c7 c1 6c ef 6c 9a 48 0f 45 f1 49 89 d9 49 89 c0 4c 89 f1 48 c7 c7 88 ef 6c 9a e8 ee 87 e8 ff <0f> 0b f3 c3 48 8b 3d 42 46 bc 00 48 8b 0d b3 bf bf 00 be 00 00
      [ 59.626159] RIP: __check_object_size+0x123/0x1c0 RSP: ffffad95c8bc3ee0
      [ 59.626161] ---[ end trace 9f7ca64b4bceeeea ]---

      Delete
    2. @Alexandre Frade
      For the wine bug, it is too early to say wether it is pds related at this point of time. The bug is reported in mm/usercopy.c:72, in report_usercopy(), if the print log in line 64 can be captured, we may know which object is copying, which will give more information about what is happening.

      For the compilation usage, those compilation workload should set to IDLE policy or give a positive nice value, this will let the front ground task(browser etc) win more cput time to run. I will double check if recent changes impact the interactivity among same nice level in NORMAL policy and let you know the result.

      Delete
    3. @Alexandre Frade
      After double checked recent code change, there is no interactivity regression under such scenario. I am working on a small change to improve interactivity slightly, still under testing. But most effective way to solve interactivity lag is to set back ground workload to IDLE policy or set a nice level.

      Delete
    4. Thank you Alfred for the feedback.
      Your work is of great value. It would be interesting to offer the PDS as an option, in addition to the CFS, in the mainline.
      I know you would follow smoothly the frantic development of Linux / Torvalds.

      Delete
    5. > For the wine bug, it is too early to say wether it is pds related at this point of time.

      Got the same bug with wine. Any news on this?

      Delete
    6. @Alfred Chen,
      See this thread about related reports.
      https://forum.xanmod.org/thread-192-post-2439.html
      I hope it helps.

      Delete
    7. > if the print log in line 64 can be captured, we may know which object is copying, which will give more information about what is happening

      How it can be done? Unfortunately, I'm not so familiar with kernel development. I'm using xanmod on linux mint 18.

      Thanks for your work!

      Delete
    8. @unxed
      Just look up above few lines in the dmesg log of kernel BUG, you should find the print log in line 64.
      @Alexandre Frade
      I need some time to catch up. Just get back home.

      Delete
    9. @Alfred Chen

      Oct 7 14:57:37 ice kernel: [78867.619601] usercopy: kernel memory overwrite attempt detected to ffff917861f2c798 (kmalloc-8) (128 bytes)
      Oct 7 14:57:37 ice kernel: [78867.619623] ------------[ cut here ]------------
      Oct 7 14:57:37 ice kernel: [78867.619625] kernel BUG at mm/usercopy.c:72!
      Oct 7 14:57:37 ice kernel: [78867.619631] invalid opcode: 0000 [#1] PREEMPT SMP

      This one?

      Delete
    10. @unxed
      Yes. It is. It may be related to the get_user_cpu_mask() in the pds.c, which is an old old 2.6.x version inherit from bfs.c.
      Would you please check the kernel config and see if CONFIG_CPUMASK_OFFSTACK is set?
      Once it is solved out, I will provide a debug patch, you can see if it works for you.

      Delete
    11. @unxed
      I not sure why this usercopy log is generated, in SYS_sched_setaffinity, it should be in copy_from_user(), but the log indicates it is writing to user space. Something is wrong.
      But anyway, here is the sync up fix for get_user_cpu_mask(), hopefully it is the cause of this issue. Please give it a try.

      I have upload the debug patch at https://bitbucket.org/alfredchen/linux-gc/downloads/sync_get_user_cpu_mask.patch

      Delete
    12. @Alfred Chen

      > if CONFIG_CPUMASK_OFFSTACK is set?

      How do I check this?

      > Please give it a try.

      Thanks! Btw, as it will be the first time I build kernel from sources, it may takes a while to go through it.

      Delete
    13. @unxed: cat /boot/config-4.13.5-xanmod10 | grep CONFIG_CPUMASK_OFFSTACK

      > CONFIG_CPUMASK_OFFSTACK=y

      A build w/ patch will be available shortly for testing.

      Delete
    14. @Alfred Chen

      Wine is now working fine with the patch you provided, thanks!

      Delete
    15. @Alexandre Frade

      > CONFIG_CPUMASK_OFFSTACK=y

      yep.

      Delete
    16. This comment has been removed by the author.

      Delete
    17. This comment has been removed by the author.

      Delete
    18. Patches:
      1. PDS0.98a + pds: Fix UP compilation issue.
      2. deadline_catch_up.patch
      3. sync_get_user_cpu_mask.patch

      http://deb.xanmod.org/pool/main/l/linux-4.13.5-pds-xanmod4/

      APT repository: sudo apt update && sudo apt install linux-image-4.13.5-pds-xanmod4 linux-headers-4.13.5-pds-xanmod4

      Delete
    19. @Alexandre Frade

      > sudo apt update && sudo apt install linux-image-4.13.5-pds-xanmod4 linux-headers-4.13.5-pds-xanmod4

      Done that, wine is working perfectly also.

      Delete
    20. @Alexandre Frade

      Thanks for your work on xanmod!

      Delete
    21. @unxed @Alexandre Frade
      Cool, fix a long existed hiden bug in PDS(previous in BFS). I will bump up the release to 098b this weekend. Still looking at warning from @pf.

      Delete
    22. @all involved & @Alfred:
      Thank you for your productive cooperation.
      It appears to me, that this catch and the fix for it solves some hassles when using current TuxOnIce (and maybe could solve for earlier versions).
      I'd continue testing without changing kernel or software to come to a safe statement, but so far, it's promising that I don't suffer from either failing resumes from TOI hibernation or protection/ segmentation faults (likely with Firefox) short time afterwards, now for 6 times in a row.
      {A short look into the patch' code reveals that TOI makes use at least of some of the parent functions of the changed one.}

      BR, Manuel Krause

      Delete
    23. Grrrr... This was a false positive.
      No. 7 utterly failed to resume (with 16 attempts to get it back).

      BR, Manuel Krause

      Delete
  5. ATM I also want some more clarifications and explanations:
    * What does the "Extend NORMAL policy level to 7 levels" introduced in 0.97b (+) do for us? -- Or how is it configurable?
    * The gkrellm shows nearly only idle load (WCG set to IDLE), but top instead shows normally started FF's WebContent as of 175% CPU (on my dual core)

    BR, Manuel Krause

    ReplyDelete
    Replies
    1. For a typical usage of 2 mpv threads running in the front ground and 2 compilation running at nice 19 on the dual-core system for example.
      Before this feature, the 2 mpv threads may stay in one core, while 2 compilation on another. That's not the right scheduling we are expected. At that time, workaround is to set the compilation workload to IDLE policy, which trigger the policy fairness balance among cores.
      With this feature, there are 7(8 for the latest code) NORMAL policy levels based on task deadline. Different NORMAL policy level also trigger policy fairness balance.
      It's not configurable,

      I don't use gkrellm, top and htop are the tools I am using, both works fine.

      Delete
    2. Thank you for your explanations. But I'm not confident with the current implementation at all.
      Firefox' WebContent uses much more top-reported CPU% than ever-ever before, what before pds(*) correlated to gkrellm's graphs. Now, gkrellm shows ~99% IDLE load, while top doesn't. And I don't believe it's gkrellm's fault.
      When cutting and saving a video in avidemux (IIRC it's doing it in IDLE mode) it's taking much longer than ever(*) before, maybe it's roughly *6.

      (*) This is the question, when this issue occurred first time. ^^
      ATM I don't have enough patience to do a complete bisect over all your last commits. So maybe you can give me a quick hint for a cherry-picked commit to revert and test the result.

      BR, Manuel Krause

      Delete
    3. @Manuel
      If gkrellm is used to work before and you would like to bitsect. Here is my suggestion.
      1st, try 0.97, 097a, 098 tagged commits on 4.13, find the last worked tagged commit
      2nd, try the commits after the tagged commit you find in above.

      There is no cpu utils code changes recently, but there is task priority(for display in htop/top) changes, maybe it's the cause why gkrellm stop working.
      2nd,

      Delete
    4. @Alfred:
      Apologies for making the noise -- I've found the culprit and it luckily had nothing to do with your great PDS work!

      Short story long: After re-testing a vrq097b kernel from a time of which I meant to remember that the issue didn't show up, and getting the same symptoms, I've followed the other idea: That is has to do with Firefox and/or some weird content in one of the tabs. By coincidence I was able to identify one page that had introduced a mining script from coinhive.com ~one week ago without notifying us users. Now it's blocked by ABP -- and I'm back to enjoying the fine foreground interactivity of most recent PDS. :-)

      Thank you for taking care and BR,
      Manuel Krause

      Delete
    5. @Alfred: (Added info)
      Good to have found the cause of the problem relatively quickly (and not to beginning bisecting).
      The PDS is behaving really well and load distribution over cpus also looks correct in both top and gkrellm's graphs. {The mining script's load was apparently counted as "idle" load in gkrellm, while under top normally within Firefox' "Web Content" sub-thread, thus leading to the differences reported.}
      Now I've even increased the FF setting "Content process limit" to 2 related to my 2 cpu cores. Looks well so far, too.

      BR, Manuel Krause

      Delete
  6. Hey Alfred; x64 built, but x86-UP failed with:
    CC kernel/sched/pds.o
    kernel/sched/pds.c: In function ‘__schedule’:
    kernel/sched/pds.c:3638:5: error: ‘struct rq’ has no member named ‘next_balance’
    rq->next_balance = rq->clock + MS_TO_NS(rr_interval + 1);
    _____^~
    make[2]: *** [scripts/Makefile.build:302: kernel/sched/pds.o] Error 1
    make[1]: *** [scripts/Makefile.build:561: kernel/sched] Error 2
    make: *** [Makefile:1019: kernel] Error 2

    ReplyDelete
    Replies
    1. @jwh7
      Fix already pushed at https://bitbucket.org/alfredchen/linux-gc/commits/ffe965d18f15721257d632efb514bddf52d21626?at=linux-4.13.y-vrq

      Delete
    2. Built yesterday; thanks Alfred!

      Delete
  7. The capability to keep the system interactive (low latency) while an Android ROM is compiling is pretty impressive !

    BFS, VRQ, PDS-mq has come a long way !

    Awesome work, Alfred !

    listening to 2 audio tracks at the same time with the ROM compilation in the background and it hardly stutters or gets interrupted for several seconds so: very very close to rt-kernel <3

    Browsing with chromium also is hardly impaired

    Thanks very much :)

    ReplyDelete
  8. @all
    Thanks for the testing and feedback. All sounds promising. That's good news.
    I'd like to introduce a little patch for interactivity, by giving less cpu usage tasks an eariler deadline when they are woken. Please try it and see if it help with interactivity in your use scenario.
    https://bitbucket.org/alfredchen/linux-gc/downloads/deadline_catch_up.patch
    PS, I will be out of town for holiday for two days and will not able to check email and replies.

    ReplyDelete
    Replies
    1. @Alfred:
      This one makes a difference in my use scenario -- in a positive way. It seems to speed up KDE menus & interaction and foreground FF. Only things suffering are tasks @IDLE, but only for a moment. I just noticed this while having a video re-coding in avidemux (@IDLE), what referenced it's progress, when interacting with the desktop and FF. "Time remaining" goes down first and then gets back up quickly.
      IMHO, this one could stay in PDS.
      But let's read what thorough-testers + gamers like Eduardo&Son and Pedro find out.

      Thank you, have joyful holidays and BR,
      Manuel Krause

      Delete
  9. Alfred,

    just got the following warning:

    https://gist.github.com/59a31d63efe186b4291cf94f4c0bc01c

    Happened within 10 secs after resume from suspend-to-RAM.

    This warning:

    ===
    3160 while (level < preempt_level) {
    3161 if (cpumask_and(&check, &sched_rq_queued_masks[level],
    3162 &p->cpus_allowed)) {
    3163 WARN_ON_ONCE(cpumask_test_cpu(cpu, &check));
    ===

    Any idea?

    ReplyDelete
    Replies
    1. Another one: https://gist.github.com/115b276a851e2a237d47288cd54aba53

      This time not after resume, but just during normal workload (music/browsing).

      Delete
    2. @pf
      This warning indicates run queue is out of order, it happens when task priority and deadline changed while it is queued. I though it was fixed in previous release by requeue tasks whenever priority/deadline changed, but it seems something is still missed.
      If the system keep running, that it is no harm, I will double check. BTW, does this appear just in this release?

      Delete
    3. @pf
      I got another report of this warning from other user on xeon and atom cpu.
      Now I am thinking about deploy a heavy debug patch to trace the last LOC of deadline change.
      What's the reproduce rate of this warning on your machines?

      Delete
    4. @pf
      I have upload the first debug patch at https://bitbucket.org/alfredchen/linux-gc/downloads/rq_ofo_debug1.patch
      Please give it a try and send back the print log, will think about the next debug patch based on the information got from this debug patch.
      Many thanks.

      Delete
    5. Reproduce rate is something like once in a couple of days.

      Will do ASAP.

      Delete
    6. Actually, since it is a WARN_ONCE, I get it just once after each reboot, so disregard my previous statement.

      I've applied debug patch and got this: https://gist.github.com/031cd2a419b5a9501d723efaadf240d3

      Delete
  10. While the error does not appear. I will try under different loads.

    ReplyDelete
    Replies
    1. Thanks for the kernel log from the debug kernel. Copy the reply to your email to keep @all in loop.
      ========================
      The dmesg_trap_i5_debug log helps alot.
      [ 154.028199] pds: 5 - 9, 102, 154526916736 5, 102, 154069767068
      This indicates the pending task(2nd one) in the run queue has almost two times deadline difference than the max possible value, which result in a wrong deadline level(9) for it. that's the cause of this issue. But the root cause should be why a task has such large deadline diff than the possible value.
      If there is more such print log from the debug load, please send to me, it will be very helpful.
      ==========================

      Delete
    2. @pf @Andrei Lavreniyuk
      Just reproduce the same issue on my site. Pls revert deadline_catch_up.patch and see if it helps(almost confirmed). And wait for my deadline catch up algorithm V2. Everyone should learn mathematics!!

      Delete
    3. Huh, okay. Anyway, my debug log was just posted above.

      Delete
  11. Show here a log with an error I can not limit the site.

    ---
    Your HTML code can not be accepted: Must be no more than 4 096 characters
    ---

    Revert deadline_catch_up.patch, result I will inform you.

    ReplyDelete
    Replies
    1. You can use pastebin.com or workupload.com (latter for even BIG files) for your info.

      BR, Manuel Krause

      Delete
    2. https://gist.github.com/AndyLavr/7a41978b5ff96ea7e1846c78f683a86a

      Delete
    3. It's strange, why on i7 I never got this error ...

      Delete
    4. i5 work fine, no trap.

      On the Atom processor, there is always the same behavior. Dead hang without debugging logs in 5-15 minutes. No patches from the latter on behavior are affected. Only disabling the PDS helps to solve the problem.

      Delete
    5. On Atom I'm getting system totally unresponsive (even no reaction on mouse move) with pds after opening too many "heavy" tabs in chrome. But after several minutes responsiveness gets back, still everything works very slow.

      Don't know if it is the same or not, but this behavour is reproducable.

      Delete
    6. > if it is the same or not
      same BUG or not

      Delete
  12. This comment has been removed by the author.

    ReplyDelete
  13. @all
    New release is out, please test based on the new release. :)

    ReplyDelete