Sunday, March 28, 2021

Project C v5.11-r3 release

Project C v5.11-r3 is released.

Sync-up work for v5.12 will begin soon. ;)

28 comments:

  1. The https://gitlab.com/alfredchen/projectc/-/blob/master/5.11/prjc_v5.11-r3.patch seems to be an empty file.

    Anyway, thank you for your work :)

    ReplyDelete
    Replies
    1. Thanks for reporting that. I have fixed it and rewrote the release notes.

      Delete
  2. @Eduardo:
    Hi,
    I've seen in
    https://gitlab.com/alfredchen/projectc/-/issues/21
    that you use PDS from PrjC with 5.11.11.
    Do you see advantages of PDS vs. BMQ ?

    TIA,
    Manuel

    ReplyDelete
    Replies
    1. Yes, I'm now using PDS for some time already. The reason, if I remember correctly, was some edge cases, when there is some large load on the system, the electron apps were just slow with BMQ, that includes Atom, Skype, MS Teams and maybe chromium too.
      It's being a while now, so I can say 100% for sure, but I remember very slow scrolling with them under load.

      So I tried PDS and I think it's better as I don't recall any slowdowns, but I can't guarantee that I ran the same load with PDS too.
      Since PDS has been revived, probably I'll stick with it, I'll try out BMQ too, if there will be some adjustments.

      BR,
      Eduardo

      Delete
    2. Thank you for your explanations, Eduardo!
      This encourages me to try PDS, as soon as the current issue-fix testing is done.
      (I've also seen some very rare mouse lagging under heavy load, but I atm. use both PS/2 Trackman + keyboard via one USB adapter, and I sometimes test different performance levels, so maybe it's not directly comparable.)

      Am I allowed to ask, which adjustments you like to see in BMQ?

      BR, Manuel

      Delete
    3. It's hard to say about adjustments, I as a user would like to see everything running smoothly :)
      Alfred is right, all frameworks are probably deasigned with standard scheduler in mind and behave like they do, but I need those apps to work properly anyway :)

      About feature, these days we have plenty of CPUs, I have this idea about running an app on dedicated cores. So, I want an app, like a game, to run on 4 dedicated CPUs for the maximum speed and no interference from other tasks, I would like every process to move off those 4 CPUs and only the game with its child processes there and game processes / threads should be properly scheduled within those 4 dedicated CPUs. If new task spawns while game is running, it should not get scheduled on those 4 dedicated CPUs.
      When the game ends, I would like all 4 CPUs back in action for the rest of the tasks as before.
      This doesn't have to be all automatic, I'm just talking abot the idea :)

      None of current solutions I'm aware of provide this and I don't know if it's even feasible to do or will it work good from scheduler perspective :)
      I have not thought how much performance can be gained this way, if it looks very small and the work to be done is way too much, let's leave this as an idea :)

      Maybe Alfred could comment on this theoretical idea, whether it even makes sense for something like this.

      BR,
      Eduardo

      Delete
    4. Good topic here. I'd like to join and reply one by one.
      For BMQ vs PDS, BMQ use the most ideally algorithm which just based on task's priority, it works fine in small system. But in large system invokes some applications which were designed for mainline linux scheduler, it may encounter priority issue.

      PDS, which is based on priority and deadline which has more overhead than BMQ, but it guarantees no priority issue like BMQ from its design.

      I am using PDS on my system right now, just because it is a new comer and needs more attention. BMQ should be fine on my small system.

      Also, recently I have an idea to replace the skiplist in PDS, I believe it will reduce some overhead comparing to BMQ. I need to think carefully before hand on it.

      Delete
    5. For Eduardo's idea, current kernel has provided some mechanism for this propose. Pls check isolcpus kernel parameter. I need to double check if this is supported in projc schedulers, but the basic idea is to reserve some cpus using isolcpus then use set_cpus_allowed_ptr() API to assigned these cpus for the tasks dedicated on them.

      Delete
    6. On my side the PDS plays very well and I don't see much CPU utilization in my usual use on the new machine (4 cores, 8 threads). BMQ seems to use some little more. OTOH, I haven't stressed PDS with heavy use cases so far.

      Eduardo's idea of dedicating tasks (or tasks + their child processes) to CPUs/threads (or groups of them) sounds very interesting to experiment with.
      Unfortunately I apparently don't have enough knowledge about nor practical experience with the CGROUPS subsystem from which I thought up to now, that it would implement such possibilities (e.g. with CPUSETS) ?
      Or is this just not yet supported by BMQ / PDS ?

      Enlighten me, please! :-)

      Manuel

      Delete
    7. ;-) This overlapped... different people having thought at the same topic same time.
      Manuel

      Delete
    8. The file /usr/src/linux/Documentation/admin-guide/cgroup-v1/cpusets.rst provides info and useful examples.
      Atm. I try to migrate userspace chromium to some dedicated cpuset.

      ALUHEAD:~ # cd /sys/fs/cgroup/cpuset/
      ALUHEAD:/sys/fs/cgroup/cpuset # mkdir Charlie
      ALUHEAD:/sys/fs/cgroup/cpuset # cd Charlie
      ALUHEAD:/sys/fs/cgroup/cpuset/Charlie # /bin/echo 6-7 > cpuset.cpus

      We need some memory attached, otherwise it won't work at all and fail with:
      /bin/echo: write error: No space left on device
      So:
      ALUHEAD:/sys/fs/cgroup/cpuset/Charlie # /bin/echo 0 > cpuset.mems

      Let's see if it works out of the box.

      Thank you for your inspiration!
      BR, Manuel

      Delete
    9. So...
      Somehow it seems to work. I've kicked the chromium to CPUs 5-6. And there now appears most utilization with in-browser video-playback. The multiple child processes look like to also spawn there.

      So far I wasn't able to remove previously set up cpusets and I also noticed, that other processes may appear in cpusets of quitted processes (e.g. an older chromium session).
      Maybe "/bin/echo 1 > cpuset.cpu_exclusive" can be of help.

      May be only some newbie troubles with this topic.

      BR, Manuel

      Delete
    10. Followup: Failure is most likely due to my misuse of one "/bin/echo $$ > tasks" in a root konsole for testing one or two directories.
      Unfortunately the process is not killable and neither are the cpusets removable.

      Manuel

      Delete
    11. Yeah, cool! The settings for dedicated CPUs also survive hibernation/ suspend without issues.
      This is with v5.11.15 + PDS & the recent pending fix patches.

      @Eduardo: Your desired functionality seems to work properly, already, with CGROUPS & CPUSETS. If you don't make my newbe errors..

      Now I'd only have to script soemthing for bash to adjust the "pidofproc" output to echo into /sys/fs/cgroup/cpuset/XYZ/tasks.

      Manuel

      Delete
    12. I have already tried the isolcpus parameter together with nohz_full quite some time ago and game ended up as jittery mess :) It microstuttered a lot.
      In addition to that, tasks on isolcpus do not get balanced, so multiple of them could end up on the same CPU if not placed carefully by hand.

      I should have been prepared better for this at least on mainline kernel :)
      I did not even look into cpusets/cgroups as I was under impression that PDS/BMQ did not support them :) However, Manuel says it's working :)

      So yesterday I had a brief moment of time available and I tried cpusets/cgroups on PDS using "cset shield", it sort of worked, but some tasks still were using cores I isolated according to htop. I have to test this more, of course.

      But before that, I have a question to Alfred, are cpusets/cgroups fully supported in BMQ/PDS so we can safely using CPU shielding?

      BR,
      Eduardo

      Delete
    13. cpusets/cgroups is not fully supported, most are dummy api implementation.

      isocpus should work out-of-box, as I just checked. And indeed, it requires users to plan tasks on cpu carefully by hand.

      Delete
    14. Mmmh, then I don't understand, why the basic functionality with the commands shown above works for me:
      To reassure myself, I just tested to limit a avidemux_qt5 video recoding process to the CPU threads 4-7 with the cpuset interface (without external apps and without touching isolcpus). Looking at the gkrellm display, more than 95% of the CPU load went to threads 4-7. I have no other explanation than: that it works.

      Or, have I misunderstood something?

      BR, Manuel

      Delete
    15. cpu affinity should be basic functionality which inherit from parent, I believe that's why it works.
      Without isocpus, other tasks which inherit affinity from init(pid 0), will still able to run on cu 4~7.

      Delete
    16. I was able to run some tests yesterday with cpusets with mainline, mainline + nohz_full, PDS and BMQ.
      My findings yesterday and the day before yesterday, are in line what Alfred said. Basically "cset shield" sets affinity for the processes and they inherit affinity from parent and that helps with overall situation.

      About tests, I simply ran Unigine Valley with and without cpusets. All kernels were built with 500Hz frequency.

      The fastest was BMQ (this is in line with my previous tests), to my actual surprise, BMQ + nohz_full + cupsets gave the best result, that is in contradiction to my previous findings with nohz_full, maybe I just messed up stuff previously :)

      Next fastest were PDS, then mainline and mainline + nohz_full.
      Mainline gained about 2% from cpusets, BMQ / PDS gained not that much, results were closer. Even nohz_full performed quite well with BMQ, even Doom Eternal was smooth AF with nohz_full. I'll be trying to use BMQ + nohz_full by default now for testing purposes.

      Please note that I tested just one benchmark, single threaded. I have to run way more benchmarks and other tasks in parallel, to get knowledge whether cpusets actually influence results in meaningful way.
      For that I'm afraid, I'll not have that much time in foreseeable future.

      So I'll be field testing nohz_full + BMQ on day to day tasks, compilations and sometime games with cpusets (I don't play much, but sometimes I do).

      BR,
      Eduardo

      Delete
    17. @Eduardo:
      Thank you for your work!
      Excuse me, but can it be that you've forgotten to mention the cpusets setup for your benchmarking tasks?
      If not clarified, your results look like a bunch of appreciated spring flowers. :-D

      BR, Manuel

      Delete
    18. I have Ryzen 1700 which have 2CCX (core complexes with separate L3 cache), so I just isolated the second CCX using "sudo cset shield --cpu 4-7,12-15 --kthread=on".
      Then if needed, change cpuset ownership to your user.

      I ran my tests using "cset shield -e somesupercommand".

      When using nohz_full, I passed kernel parameter nohz_full=4-7,12-15 too.

      BR,
      Eduardo

      Delete
    19. @Eduardo:
      The cset commands set is a real mess. Who decided to put commands and docs into approx. 570 pieces for only one purpose? :-((

      Delete
    20. @Eduardo:
      This definitely shouldn't sound impolite at all. (I was just overwhelmed by the amount of different man-pages for cset.)
      I, of course, thank you very much for your advice and information!

      BR, Manuel

      Delete
  3. > cpu affinity should be basic functionality which inherit from parent, I believe that's why it works.
    > Without isocpus, other tasks which inherit affinity from init(pid 0), will still able to run on cu 4~7.

    Yes, this is what I can observe on here. The setting of /sys/fs/cgroup/cpuset//cpuset.cpu_exclusive to 1 isn't sufficient to isolate the CPU -- it only isolates the processes and children. At this point I can't understand the term in kernel-parameters.txt regarding isolcpus: "[Deprecated - use cpusets instead]".

    I haven't read all relevant(?) web info regarding cpusets yet... But I caught one idea, to put all processes into separate cpusets, like containers, plenty of possibilities (e.g. 0-1 for base processes, 2-3 for browsing, 4-... etc.).

    Maybe the most stupid question of the year, but the most important for me: How do I put init (PID 0) into a limited cpuset ?

    TIA and BR,
    Manuel

    ReplyDelete
    Replies
    1. > Maybe the most stupid question of the year, but the most important for me: How do I put init (PID 0) into a limited cpuset ?
      isolcpus can do this, it kicks in at very early stage, at sched_init_smp(), below code sets init (pid0) cpus.

      2 >-------/* Move init over to a non-isolated CPU */
      3 >-------if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_FLAG_DOMAIN)) < 0)
      4 >------->-------BUG();

      Another possible way is "auto group" feature, maybe limited cpus can be set at that time, or control that group later after system boot-up.

      Delete
    2. Just to get your concept right: As an example, adding "isolcpus=2-7" to kernel command line would leave CPUs 0 & 1 open for init and children? And the further cpusets can be then configured later (either with cpusets directly or with the more convenient cset)?

      The SCHED_AUTOGROUP 'Depends on: !SCHED_ALT [=y]' and in addition to this, it's current concept doesn't sound convincing for Eduardos and my intentions, IMO.
      Let's see how far I get, when I'm allowed to reboot again, after issue/23 testing finished. :-)

      Many many thanks to you both, @Eduardo and @Alfred, for this discussion. I've learned quite a lot about a topic that I was interested in for long time. Nice to have you here!

      BR, Manuel

      Delete
    3. @Alfred & @ Eduardo:
      I don't know if these threads (here) about isolating CPUs with Project C do get much attention over time.
      Refining during the last weeks, I've got a more or less simple setup working, combining the needed "isolcpus" kernel command line parameter, "cset shield", some "/sys/fs/cgroup/cpuset" manipulations and with some short scripts to migrate processes to the desired cpuset.

      Should I write a summary of our discussion of this topic into the "Issues" section @gitlab, including examples of scripts/ commands that do work?
      I still have further questions regarding this topic, and may be others as well. E.g. whether "cpuset.cpu_exclusive", "cpuset.sched_relax_domain_level" and "cpuset.sched_load_balance" get any effect with current Project C.

      Wouldn't it be good to have this topic over there?

      BR,
      Manuel

      Delete
  4. Thunder Titanium Lights - Classic Lighting | Tioga Arts
    Thunder Titanium Lights have a flat screen LED used ford edge titanium display to keep 4x8 sheet metal prices near me you connected titanium white dominus price to titanium glasses frames your video and it means more than titanium wedding band sets you might think.

    ReplyDelete