Sunday, July 14, 2019

BMQ 0.98 release

BMQ 0.98 is released with the following changes

1. Minor comment/code readable changes.
2. BMQ default RLIMIT_NICE, which let normal privilege users can set nice up to -10.
3. Instroduce __bmq_find_first_bit()/__bmq_find_next_bit() for rq->queue.bitmap, to avoid zero checking.

In this release, there is just one minor improvement(#3) for BMQ itself. But there is #2 for user experience. Now, even normal privilege users can promote interactive focused tasks to nice level up to -10, for example, I'd like to use "nice --5 firefox" to run firefox so heavy normal nice level 0 tasks on the terminal won't impact it. Usually, normal privilege users can't set nice level lower than 0. But it should be a nice to have privilege if using BMQ.

Further explanation  about the above use case, consider there is +/-4 auto nice level adjustment in BMQ, the firefox and the normal nice level 0 tasks will still have some overlap in scheduler level for some time, but the result turns out very good. If no overlap is really required, nice level -10 can be used for the firefox.

Enjoy BMQ 0.98 for v5.2 kernel, :)

Full kernel tree repository can be found at https://gitlab.com/alfredchen/linux-bmq
And all-in-one patch can be found at gitlab.

Please report bugs at https://gitlab.com/alfredchen/bmq/issues.

18 comments:

  1. Up and running on multiple machines no problems so far.

    ReplyDelete
  2. Runs good but I think 0.97 has been faster.

    ReplyDelete
    Replies
    1. There is just #2 minor change to BMQ itself, it shouldn't be any human noticeable different. :)

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Here is the debug patch to enable CGROUP_CPUACCOUNTING, pls give it a test, it's not guarantee to work.
      https://gitlab.com/alfredchen/bmq/blob/master/5.2/enabled_CGROUP_CPUACCOUNTING.patch

      Delete
    2. Thanks to response, yes, it works :)
      I noted that cpuacct is also disabled on others cpu sched, like muqss, so I discovered that had an reason for it and decided delete my post above xD.
      I'll continue to use bmq with this patch and see how it will work for next days.

      Delete
  4. my test with ryzen 5 2600 in kernel compilation

    # bmq0.98 500hz NO_HZ_FULL
    real 5m14,579s
    user 33m45,957s
    sys 3m48,909s

    # muqss0.193-smt 100hz NO_HZ_FULL
    real 4m13,836s
    user 41m25,789s
    sys 2m29,432s

    # muqss0.193-mc 100hz NO_HZ_FULL
    real 4m13,132s
    user 42m56,970s
    sys 2m38,041s

    ReplyDelete
    Replies
    1. Interesting. On my test platforms, there is no sign of regression during the sanity tests from release to release, also comparing to CFS mainline scheduler.

      So, does other ryzen user has similar issues?
      And would you plz post the output of "dmesg | grep -i bmq" so I can check the cpu topology setup for bmq?

      Delete
    2. [ 0.167088] bmq: BMQ CPU Scheduler 0.98 by Alfred Chen.
      [ 0.461691] bmq: cpu #0 affinity check mask - smt 0x00000040
      [ 0.461691] bmq: cpu #0 affinity check mask - coregroup 0x000001c6
      [ 0.461691] bmq: cpu #0 affinity check mask - core 0x00000e38
      [ 0.461691] bmq: cpu #1 affinity check mask - smt 0x00000080
      [ 0.461691] bmq: cpu #1 affinity check mask - coregroup 0x000001c5
      [ 0.461691] bmq: cpu #1 affinity check mask - core 0x00000e38
      [ 0.461691] bmq: cpu #2 affinity check mask - smt 0x00000100
      [ 0.461691] bmq: cpu #2 affinity check mask - coregroup 0x000001c3
      [ 0.461691] bmq: cpu #2 affinity check mask - core 0x00000e38
      [ 0.461691] bmq: cpu #3 affinity check mask - smt 0x00000200
      [ 0.461691] bmq: cpu #3 affinity check mask - coregroup 0x00000e30
      [ 0.461691] bmq: cpu #3 affinity check mask - core 0x000001c7
      [ 0.461691] bmq: cpu #4 affinity check mask - smt 0x00000400
      [ 0.461691] bmq: cpu #4 affinity check mask - coregroup 0x00000e28
      [ 0.461691] bmq: cpu #4 affinity check mask - core 0x000001c7
      [ 0.461691] bmq: cpu #5 affinity check mask - smt 0x00000800
      [ 0.461691] bmq: cpu #5 affinity check mask - coregroup 0x00000e18
      [ 0.461691] bmq: cpu #5 affinity check mask - core 0x000001c7
      [ 0.461691] bmq: cpu #6 affinity check mask - smt 0x00000001
      [ 0.461691] bmq: cpu #6 affinity check mask - coregroup 0x00000187
      [ 0.461691] bmq: cpu #6 affinity check mask - core 0x00000e38
      [ 0.461691] bmq: cpu #7 affinity check mask - smt 0x00000002
      [ 0.461691] bmq: cpu #7 affinity check mask - coregroup 0x00000147
      [ 0.461691] bmq: cpu #7 affinity check mask - core 0x00000e38
      [ 0.461691] bmq: cpu #8 affinity check mask - smt 0x00000004
      [ 0.461691] bmq: cpu #8 affinity check mask - coregroup 0x000000c7
      [ 0.461691] bmq: cpu #8 affinity check mask - core 0x00000e38
      [ 0.461691] bmq: cpu #9 affinity check mask - smt 0x00000008
      [ 0.461691] bmq: cpu #9 affinity check mask - coregroup 0x00000c38
      [ 0.461691] bmq: cpu #9 affinity check mask - core 0x000001c7
      [ 0.461691] bmq: cpu #10 affinity check mask - smt 0x00000010
      [ 0.461691] bmq: cpu #10 affinity check mask - coregroup 0x00000a38
      [ 0.461691] bmq: cpu #10 affinity check mask - core 0x000001c7
      [ 0.461691] bmq: cpu #11 affinity check mask - smt 0x00000020
      [ 0.461691] bmq: cpu #11 affinity check mask - coregroup 0x00000638
      [ 0.461691] bmq: cpu #11 affinity check mask - core 0x000001c7

      Delete
    3. and with muqss i have bigger cpu usage and more/stable fps

      Delete
    4. Pls check with mainline CFS, I'll double check coregroup scheduling in some cases.

      Delete
    5. # cfs default (not tuned) 500hz NO_HZ_FULL
      real 4m1,772s
      user 36m55,804s
      sys 4m2,401s

      hm i didn't think it would be faster than muqss
      all cpu schedulers tested under the same conditions

      Delete
    6. I guess you pass "-j12" for the kernel compilation, right?
      All my machines have just two level topology, those are "smt" and "coregroup". It's different form ryzen 5 2600 which has an additional "core" level. My best guess is BMQ failed to sustain cpu load from "core" level cpus, but it's hard to tell from code.
      Pls help to report a bug at https://gitlab.com/alfredchen/bmq/issues, I'd like to see if other ryzen 5(or similar topology) users report similar issues.
      Meanwhile, I will try to reproduce it using qemu with proper cpu topology setup.

      Delete
    7. >> I guess you pass "-j12" for the kernel compilation, right?
      yes
      >> Pls help to report a bug at https://gitlab.com/alfredchen/bmq/issues
      done: https://gitlab.com/alfredchen/bmq/issues/7

      Delete
    8. Since Con Kolivas recommends the whole -ck patchset, and not only the MuQSS patch + that it is intended to run with CONFIG_NO_HZ_IDLE as per comment: http://ck-hack.blogspot.com/2018/12/linux-420-ck1-muqss-version-0185-for.html?showComment=1549056997845#c6415843931930386538 , i was wondering if you had done the same comparison with those settings?

      Delete
    9. I have Ryzen 1700 + OC, I'll try to reproduce the issue in Manjaro which has full tickless by default, therefore I'm not able to check tickless idle without recompiling stock kernel and my own.
      Let's see how much time I have to compile everything for tests :)

      Btw, ryzen differs from intel with so called core complexes (CCX) 4 cores in each CCX, multiple CCX found in one ryzen chip.
      The benefit could be to keep tasks scheduled in one CCX then there is no penalty, but when multiple CCX communicate, bigger latency is involved.

      My observations are that CK do not address this issue either, therefore queue count is off when running muqss on ryzen, like 17 on 8 cores or 25 on 4 cores, etc., users report different counts and performance numbers, this somehow doesn't seem right, but I'm just an observer here :)

      BR, Eduardo

      Delete
    10. I have done testing on Manjaro, there seems to be a regression. I'll post details to gitlab soon.
      The thing is and I don't know the reason, on Ubuntu with 5.2+BMQ+nohz_full kernel builds even faster - 11 minutes, the same kernel tree with the same config with the same gcc version was tested in Manjaro and Ubuntu, but I'll discard that result for now until I'm sure why.
      Other thing is that 5.2 kernel or BMQ hangs on Ryzen and i7 as well, more frequent on Ryzen. I haven't figured out whether that's 5.2 or BMQ.

      BR, Eduardo

      Delete