Thursday, March 23, 2017

VRQ 0.94 release

VRQ 0.94 is released with the following changes

1. Remove duplicated code
2. Fix compilation issue with CONFIG_CPU_FREQ and CONFIG_IRQ_TIME_ACCOUNTING
3. Remove root_domain and sched_domain, which VRQ doesn't depend on. This reduce about 2k LOC of the scheduler code.

I have done some tests on SMT machine for the >100% regression issue, but for some kind of reason, I can't reproduce it, so now it is in the long-term watch list and let VRQ development going. Enjoy VRQ 0.94 for v4.10 kernel, :)

In next release, hrtimer and full no hz support are in the planning list.

code are available at
https://bitbucket.org/alfredchen/linux-gc/commits/branch/linux-4.10.y-vrq
and also
https://github.com/cchalpha/linux-gc/commits/linux-4.10.y-vrq

All-in-one patch is available too.

BR Alfred 

28 comments:

  1. Hi @Alfred,

    got this error:

    kernel/built-in.o: In function `generate_sched_domains':
    cpuset.c:(.text+0xb7045): undefined reference to `alloc_sched_domains'
    cpuset.c:(.text+0xb7325): undefined reference to `alloc_sched_domains'
    cpuset.c:(.text+0xb7531): undefined reference to `alloc_sched_domains'
    kernel/built-in.o: In function `cpuset_write_s64':
    cpuset.c:(.text+0xb809d): undefined reference to `sched_domain_level_max'
    make: *** [Makefile:969: vmlinux] Error 1

    This is manjaro 4.10.4 with SMT nice
    vrq-0.93a compiles ok

    Dzon

    ReplyDelete
    Replies
    1. A quick workaround will be disabled CONFIG_CPUSETS in config file. I'd work on fix patch ASAP.

      Delete
    2. I have push the fix. Please check it out at
      https://bitbucket.org/alfredchen/linux-gc/commits/cc0ea4363856bd4bc95190ba5b8ae93c3edd5c10?at=linux-4.10.y-vrq

      Please let me know if other compilation issue.

      Delete
    3. Thank you. I applied the mentioned commit and it compiled fine. Running the kernel now. Its running quite well and feels very snappy, but I still see the problem with 100% use of all cores. I'm testing it by compiling the kernel with 8 threads on 4c/8t cpu. I know you are looking into this but can't replicate. I will try to make some comparison of average cpu time and real world time taken for the job.

      Dzon

      Delete
    4. Kernel compile with 8threads on AMD FX-8350
      4.10.4 vanilla:
      8751.30user 644.95system 23:51.55elapsed 656%CPU (0avgtext+0avgdata 358244maxresident)k
      362024inputs+6850720outputs (421major+278310950minor)pagefaults 0swaps

      4.10.4 vrq-094:
      7702.38user 585.55system 34:40.44elapsed 398%CPU (0avgtext+0avgdata 358380maxresident)k
      361208inputs+6850672outputs (428major+278104561minor)pagefaults 0swaps

      4.10.4 vrq-094 nice -19 (forgot to flush caches):
      7683.38user 588.34system 35:16.16elapsed 390%CPU (0avgtext+0avgdata 358376maxresident)k
      72inputs+6850648outputs (2major+278130221minor)pagefaults 0swaps

      4.10.4 vrq-094 (SMT_NICE=n):
      7654.32user 582.49system 35:56.27elapsed 381%CPU (0avgtext+0avgdata 358272maxresident)k
      342168inputs+6850672outputs (424major+278102887minor)pagefaults 0swaps

      Dzon

      Delete
    5. 4.10.4 ck1:
      9301.46user 283.77system 22:37.44elapsed 706%CPU (0avgtext+0avgdata 358276maxresident)k
      341272inputs+6850584outputs (424major+278108662minor)pagefaults 0swaps

      Dzon

      Delete
    6. Hi, Dzon,
      Would you please send me an output of "dmsg | grep -i vrq" for 094 kernel? It's stranger that it only use 4 core/thread?

      Delete
    7. @Alfred
      dmesg | grep -i vrq

      [ 1.257373] vrq: task 14 has no online cpu to run on.
      [ 1.264039] vrq: task 15 has no online cpu to run on.
      [ 1.267372] vrq: task 16 has no online cpu to run on.
      [ 1.274039] vrq: task 17 has no online cpu to run on.
      [ 1.277372] vrq: task 18 has no online cpu to run on.
      [ 1.280689] vrq: task 19 has no online cpu to run on.
      [ 1.369398] vrq: task 20 has no online cpu to run on.
      [ 1.369420] vrq: task 21 has no online cpu to run on.
      [ 1.369434] vrq: task 22 has no online cpu to run on.
      [ 1.369449] vrq: task 23 has no online cpu to run on.
      [ 1.369469] vrq: task 24 has no online cpu to run on.
      [ 1.369484] vrq: task 25 has no online cpu to run on.
      [ 1.456056] vrq: task 26 has no online cpu to run on.
      [ 1.456079] vrq: task 27 has no online cpu to run on.
      [ 1.456095] vrq: task 28 has no online cpu to run on.
      [ 1.456110] vrq: task 29 has no online cpu to run on.
      [ 1.456127] vrq: task 30 has no online cpu to run on.
      [ 1.456143] vrq: task 31 has no online cpu to run on.
      [ 1.539411] vrq: task 32 has no online cpu to run on.
      [ 1.539422] vrq: task 33 has no online cpu to run on.
      [ 1.539431] vrq: task 34 has no online cpu to run on.
      [ 1.539441] vrq: task 35 has no online cpu to run on.
      [ 1.539458] vrq: task 36 has no online cpu to run on.
      [ 1.539471] vrq: task 37 has no online cpu to run on.
      [ 1.622751] vrq: task 38 has no online cpu to run on.
      [ 1.622761] vrq: task 39 has no online cpu to run on.
      [ 1.622771] vrq: task 40 has no online cpu to run on.
      [ 1.622781] vrq: task 41 has no online cpu to run on.
      [ 1.622798] vrq: task 42 has no online cpu to run on.
      [ 1.622812] vrq: task 43 has no online cpu to run on.
      [ 1.706065] vrq: task 44 has no online cpu to run on.
      [ 1.706077] vrq: task 45 has no online cpu to run on.
      [ 1.706091] vrq: task 46 has no online cpu to run on.
      [ 1.706118] vrq: task 47 has no online cpu to run on.
      [ 1.706138] vrq: task 48 has no online cpu to run on.
      [ 1.706153] vrq: task 49 has no online cpu to run on.
      [ 1.789392] vrq: task 50 has no online cpu to run on.
      [ 1.789403] vrq: task 51 has no online cpu to run on.
      [ 1.789412] vrq: task 52 has no online cpu to run on.
      [ 1.789422] vrq: task 53 has no online cpu to run on.
      [ 1.789439] vrq: task 54 has no online cpu to run on.
      [ 1.789451] vrq: task 55 has no online cpu to run on.
      [ 1.882271] vrq: sched_cpu_affinity_chk_masks[0] smt 0x02
      [ 1.882273] vrq: sched_cpu_affinity_chk_masks[0] coregroup 0x252
      [ 1.882274] vrq: sched_cpu_affinity_chk_masks[1] smt 0x01
      [ 1.882274] vrq: sched_cpu_affinity_chk_masks[1] coregroup 0x252
      [ 1.882275] vrq: sched_cpu_affinity_chk_masks[2] smt 0x08
      [ 1.882275] vrq: sched_cpu_affinity_chk_masks[2] coregroup 0x243
      [ 1.882276] vrq: sched_cpu_affinity_chk_masks[3] smt 0x04
      [ 1.882276] vrq: sched_cpu_affinity_chk_masks[3] coregroup 0x243
      [ 1.882277] vrq: sched_cpu_affinity_chk_masks[4] smt 0x32
      [ 1.882277] vrq: sched_cpu_affinity_chk_masks[4] coregroup 0x207
      [ 1.882278] vrq: sched_cpu_affinity_chk_masks[5] smt 0x16
      [ 1.882278] vrq: sched_cpu_affinity_chk_masks[5] coregroup 0x207
      [ 1.882279] vrq: sched_cpu_affinity_chk_masks[6] smt 0x128
      [ 1.882279] vrq: sched_cpu_affinity_chk_masks[6] coregroup 0x63
      [ 1.882280] vrq: sched_cpu_affinity_chk_masks[7] smt 0x64
      [ 1.882280] vrq: sched_cpu_affinity_chk_masks[7] coregroup 0x63
      [ 3.015970] BFS enhancement patchset VRQ 0.94 by Alfred Chen.

      It uses all 8 cores sometimes almost all cores with 100% load at the same time. I think this is mostly in the beginning of the kernel compile job. But its very inconsistent and most of the time tools like htop/top show none of the cores fully utilized, but still something going on on all of them. These tests were done on my home computer (AMD) because it was idle at he time. My work computer (Intel Core i7-920) is showing the same, if not worse behavior. I can run tests on that some afteroon. On the other hand with light load vrq feels more responsive.

      Dzon

      Delete
    8. @Dzon
      Thanks for providing the output. The cpumask output should print in hex format, and will be fix in next release. It's still readable, and the topology setup seems to be just fine and as expected for your AMD cpu(4 physical cores/8 threads?)
      Is intel i7-920 showing same not fully utilized issue? And the system setup are identical with tests running on CFS kernel?

      Delete
    9. @Alfred,
      I ran the same test on my work computer when it was iddle:

      Kernel compile with 8threads on Core i7-920(3GHz)

      4.10.6 vanilla:
      11612.34user 665.46system 30:30.13elapsed 670%CPU (0avgtext+0avgdata 358280maxresident)k
      372328inputs+6856816outputs (377major+277209657minor)pagefaults 0swaps

      4.10.4 vrq-094:
      9140.88user 527.36system 45:15.23elapsed 356%CPU (0avgtext+0avgdata 358268maxresident)k
      341272inputs+6859376outputs (328major+277485697minor)pagefaults 0swaps

      dmesg | grep -i vrq

      [ 0.202513] vrq: task 14 has no online cpu to run on.
      [ 0.209180] vrq: task 15 has no online cpu to run on.
      ..
      [ 0.731316] vrq: task 54 has no online cpu to run on.
      [ 0.731325] vrq: task 55 has no online cpu to run on.
      [ 0.818783] vrq: sched_cpu_affinity_chk_masks[0] smt 0x16
      [ 0.818784] vrq: sched_cpu_affinity_chk_masks[0] coregroup 0x238
      [ 0.818785] vrq: sched_cpu_affinity_chk_masks[0] others 0x65280
      [ 0.818786] vrq: sched_cpu_affinity_chk_masks[1] smt 0x32
      [ 0.818786] vrq: sched_cpu_affinity_chk_masks[1] coregroup 0x221
      [ 0.818787] vrq: sched_cpu_affinity_chk_masks[1] others 0x65280
      [ 0.818787] vrq: sched_cpu_affinity_chk_masks[2] smt 0x64
      [ 0.818788] vrq: sched_cpu_affinity_chk_masks[2] coregroup 0x187
      [ 0.818788] vrq: sched_cpu_affinity_chk_masks[2] others 0x65280
      [ 0.818789] vrq: sched_cpu_affinity_chk_masks[3] smt 0x128
      [ 0.818789] vrq: sched_cpu_affinity_chk_masks[3] coregroup 0x119
      [ 0.818790] vrq: sched_cpu_affinity_chk_masks[3] others 0x65280
      [ 0.818790] vrq: sched_cpu_affinity_chk_masks[4] smt 0x01
      [ 0.818791] vrq: sched_cpu_affinity_chk_masks[4] coregroup 0x238
      [ 0.818791] vrq: sched_cpu_affinity_chk_masks[4] others 0x65280
      [ 0.818792] vrq: sched_cpu_affinity_chk_masks[5] smt 0x02
      [ 0.818792] vrq: sched_cpu_affinity_chk_masks[5] coregroup 0x221
      [ 0.818793] vrq: sched_cpu_affinity_chk_masks[5] others 0x65280
      [ 0.818793] vrq: sched_cpu_affinity_chk_masks[6] smt 0x04
      [ 0.818794] vrq: sched_cpu_affinity_chk_masks[6] coregroup 0x187
      [ 0.818794] vrq: sched_cpu_affinity_chk_masks[6] others 0x65280
      [ 0.818795] vrq: sched_cpu_affinity_chk_masks[7] smt 0x08
      [ 0.818795] vrq: sched_cpu_affinity_chk_masks[7] coregroup 0x119
      [ 0.818796] vrq: sched_cpu_affinity_chk_masks[7] others 0x65280
      [ 1.455612] BFS enhancement patchset VRQ 0.94 by Alfred Chen.

      Dzon

      Delete
    10. @Dzon
      Would you please send me a email and attach 4 kernel config file(CFS and VRQ kernel config for AMD cpu, CFS and VRQ kernel config for intel i7 920).
      I'd like to compare the kernel config and see if it can be reproduced in my site, and may send you debug patch to isolate some possible cause of the topology setup(i7).

      Delete
  2. @Alfred,

    I have been using VRQ on Skylake laptop and Phenom desktop as my main driver for a long time, it mostly behaves ok. I did not comment much as it's just working and there is nothing really to say. Thanks for hat. As for 094, after patch it compiles fine and I'm using it right now.
    There were some hard lockups in both laptop and desktop, but I don't think it's VRQ related as a lot is bleeding edge on my rigs :)

    But I have an interesting question, tho. See, my old Phenom is getting very old and I'm planning to get Ryzen CPU, dunno when exactly, but in near future. As I understand VRQ implements some sort of sticky task which does not find any first CPU/core/thread and migrate tasks left and right which on Ryzen might be a slight problem due to CCX being interconnected somehow different which incurrs performance penalty.
    Do You have any thoughts on this how well VRQ might fare? This might be very wide question, sorry for that, but, to me, still worth a shot :)

    regards,
    Eduardo

    ReplyDelete
    Replies
    1. @Edurado
      Sorry that I can't reply to u during the last weekend.
      Be honest, I don't have AMD cpu to test the scheduler, but the scheduler code relies on existed API to set up cpu topology. If these APIs reported correct result on Ryzen, that should be ok.
      That's why I ask for "dmsg | grep -i vrq" output in above reply, I want to check how the cpu topology setup in Dzon's AMD FX-8350.
      Another problem I can think of is SMT, please check the my reply to Pedro below.

      Delete
    2. @Alfred
      I think my problem might be related to SMT, but i think Edurado wasn't talking about SMT. There might be one more obstacle with Ryzen (which i am tempted to upgrade to, too) and the mentioned interconnect. From what i gathered from looking through info about this CPU on net, the interconnect is between two "modules" which themselves are 4C/8T processors. They share caches through that interconnect (its throughput is dependent also on memory frequency) and there seems to be additional performance penalty when two or more threads sharing the same memory are running on different modules. Simplified, if I understand it correctly, two forked threads on same module have better performance as when they are running on different modules. SMT-ception?

      Dzon

      Delete
  3. Thanks Alfred.

    I remember you wrote some times ago about performance regression with SMT
    (http://cchalpha.blogspot.fr/2016/07/heads-up-performance-regression-over.html).

    So I decided to try older releases of linux kernel and BFS. I also did some tests with SMT disabled in the BIOS.
    It seems that 'regression' was already there in linux 4.3 + bfs465.
    Maybe it is tied to BFS and VRQ design.
    Just writing this to let you know.

    See results under 'SMT Regression' sheet at:

    https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

    Pedro

    ReplyDelete
    Replies
    1. @Pedro
      The regression with SMT you reference to, is compare to BFS/VRQ themselves, not compare to mainline CFS.
      I can understand why there is regression under <100% workload, as currently, VRQ has no sense of picking up a smt cpu at all. In some scenario, for example, there are idle physical cores available, scheduler should not choose smt cpu to run on.
      What troubles me most is the regression under >=100% workload, but I can't reproduce it even with CFS mainline scheduler.
      So, I have to put this aside and work on the solution for SMT cpu to solve the under <100% workload regression, after my working on items(hrtimer and full nohz) are done.

      Delete
    2. @Alfred
      Thanks for the clarifications.

      Pedro

      Delete
  4. @Alfred,

    Unfortunately I have come to conclusion that VRQ hangs my 3 computers. First I thought that it may be due to bleeding edge softwae on my desktop Phenom II, but no.
    So, I have experienced hangs on: Skylake laptop (modesetting driver for intel 530, Ubuntu 17.04), Phenom II desktop (radeon driver & mesa from git, Ubuntu 16.04), Core 2 Duo E8400 (nvidia driver, Ubuntu 16.04). Computers have almost nothing in common, just VRQ. E8400 is the easiest to crash, just start steam download and smth else, crash.

    I can not give You any more details coz I don't have anything in the logs or on the screen, it just silently locks up and that's it. All computers lock up in the same way.
    I'll now try to use mux to see how that fares. I have verified that using mux on E8400 *does solve* the crash problem. In case you're wondering, I build kernels all the same except for the scheduler patch - I either use VRQ or MUX, the rest is identical.

    If You need smth, ask. Do others experience the same?

    regards
    Eduardo

    ReplyDelete
    Replies
    1. @Eduardo: If you mean MuQSS from Con Kolivas please use this "brand" name. "Mux" is something else.
      Thx, Manuel Krause

      Delete
    2. @Eduardo
      Do you have hang issue just with VRQ 0.94? How about previous versions? AIK, VRQ used to run fine on your machines.

      Delete
    3. @Alfred,

      I think it started like with 0.94, at least that's when I noticed this on Phenom and Skylake. On E8400 I don't know, coz 4.10 + [VRQ,MuQSS] were the first custom kernels on that machine.
      As I said previously I thought that crashes are due to bleeding edge stuff on Phenom, but when Skylake crashed and E8400 as well I started wondering what's going on and what's common between those computers.

      regards
      Eduardo

      Delete
    4. @Alfred,

      I will try to compile a version of kernel w/o BFQ, maybe that is a cause for freeze, then the only custom thing will be VRQ.
      Maybe VRQ triggers some unwanted behaviour in BFQ.
      MuQSS is not really an option for me, coz VM's are freezing with it, with VRQ it's fine however.

      regards
      Eduardo

      Delete
    5. @Eduardo
      Besides test the kernel w/o BFQ, I'd like to suggest to return back to VRQ 0.93a, if it is confirmed to be a good version. That's only 3 commits between 0.93a to 0.94, should be easier to find out which contributes to your hangs.

      Delete
    6. @Alfred,

      there is a little progress, BFQ was not the culprit on E8400, setting yield_type to 0, freeze pretty much go away on that machine.
      I'll set that on Phenom and Skylake as well, let's see how that fares.

      regards
      Eduardo

      Delete
  5. @Alfred:
    For me everything works quite fine. Including -j2 kernel compiling on dualcore. Both cores loaded equally with idle load in background.
    I've received some segfaults randomly from different processes, though. I blame them to the new openSUSE, as these issues' number decreased each time I updated it. I'm not sure whether it depends on in-kernel-resume. Currently the number of resumes increases the possibility of failures. Firefox still aborts after 1 1/2 day without evidence.

    BR, Manuel Krause

    ReplyDelete
  6. Bring up this eaten post, good to see it is not breaking everything, :)

    >>@Alfred:
    >>For me everything works quite fine. Including -j2 kernel compiling on dualcore. Both >>cores loaded equally with idle load in background.
    >>I've received some segfaults randomly from different processes, though. I blame them >>to the new openSUSE, as these issues' number decreased each time I updated it. I'm >>not sure whether it depends on in-kernel-resume. Currently the number of resumes >>increases the possibility of failures. Firefox still aborts after 1 1/2 day without >>evidence.
    >>
    >>BR, Manuel Krause

    ReplyDelete
  7. can you update https://bitbucket.org/alfredchen/linux-gc/downloads/v4.10_vrq094.patch ?

    ReplyDelete
    Replies
    1. @Anonymous:
      If Alfred hadn't added things silently, you can only add one commit on top of VRQ 0.94. Here a direct link to the commit's patch: https://github.com/cchalpha/linux-gc/commit/cc0ea4363856bd4bc95190ba5b8ae93c3edd5c10.patch
      You can easily follow new commits on https://github.com/cchalpha/linux-gc/commits/linux-4.10.y-vrq or the like and fetch each of them as kernel-ready patch by adding .patch in the commit's url individually.

      HTH and best regards, Manuel Krause

      Delete