Tuesday, September 25, 2018

PDS 0.99a release

PDS 0.99a is released with the following changes

1. Fix sugov_kthread_create fail to set policy.(Thanks jwh7's reporting and testing)
2. Fix task burst fairness issue. Thanks Holger reporting video playback issue, which remind me my mpv lost frame when running "git checkout", that burst creating a lot "git" tasks, I used to believe it is high io usage issue and ignored it. I'm not sure if this commit fix Holger's issue or not, but hopefully it will help.

This is another bug fix release. For #1, current fix is not perfect. I'll leave it to next kernel release cycle to decide how PDS handling mainline existed scheduler policies.

Enjoy PDS 0.99a for v4.18 kernel, :)

Code are available at https://gitlab.com/alfredchen/linux-pds
All-in-one patch is available too.

33 comments:

  1. @Alfred, great news!
    I test these patches occasionally and this one seems to resolve my oldest problem. The problem I reported was incorrect ballancing of 1:1 thread:cpu usecase. In other words pds was unable to fully utilize 100% 8-thread job on 4c/8t cpu. This version works as expected, also better than vanilla (sorry for differing versions, I didn't test every pds and kernel version):

    job:linux compilation -j8

    pds98y
    ==> Dokončené vytváranie: linux418 4.18.5-1 (St 5. september 2018, 13:15:05 CEST)
    6215.41user 816.07system 1:15:51elapsed 154%CPU (0avgtext+0avgdata 1769228maxresident)k
    336114058inputs+7863270outputs (303095major+460948151minor)pagefaults 0swap

    Vanilla
    ==> Dokončené vytváranie: linux418 4.18.6-1 (Št 6. september 2018, 12:26:16 CEST)
    9082.02user 1131.42system 29:07.84elapsed 584%CPU (0avgtext+0avgdata 1769100maxresident)k
    347749798inputs+8059830outputs (311705major+470653173minor)pagefaults 0swaps

    pds99a
    ==> Dokončené vytváranie: linux418 4.18.9-1 (Št 27. september 2018, 13:04:03 CEST)
    8538.24user 972.77system 22:49.01elapsed 694%CPU (0avgtext+0avgdata 1769220maxresident)k
    346232021inputs+8100155outputs (311872major+470772069minor)pagefaults 0swaps

    I will test if pds98z works too and AMD 8c/16t system.

    Regards, Dzon.

    ReplyDelete
    Replies
    1. This is how it looks in system monitor:
      https://imgur.com/7yovR9w
      System overhead is a little bit higher than it was in old BFS (~10%), but that is undertandable due to increased complexicity of this scheduler. I will look for old system monitor pictures for comparison or re-run the non working test.

      Regards, Dzon.

      Delete
    2. @Dzon
      Thanks for your sharing. But I don't fully utilized issue like you. Kernel compilation always occupied 100% cpu on all core/thread. Just one exception, once a bugged version of "make" installed, which driver me crazy to find it out. So I believe your issue is most likely not scheduler related.

      Anyway, good to know that it now works good for you. :)

      Delete
    3. Yes, I know. Which is weird nobody else had this problem. Remember, vanilla and muqss (also vrq until 4.8 on the same distribution) work. I suspected kernel config, but still it would be good to know the culprit. Maybe Manjaro changed the config. I will continue testing to find out, it should be easy now that I have a working system.

      BR, Dzon.

      Delete
    4. Hi Alfred,
      i have tested older patches and then older kernel versions. Result was the same. So I started rolling back changes I made to the system and found that ananicy daemon is the cause pds works better for me. If I disable it the problem comes back. This daemon sets process priorities according to configured profiles. The priority of the compile job doesn't change even with ananicy enabled and still it has a massive effect. If you need to run some test, let me know.

      Regars, Dzon.

      Delete
    5. @Dzon
      I'd suggest you testing in a pure environment, for example "single mode", disable most daemon as possible.

      Delete
    6. Well, yes of course. But thats when it doesnt work (it seemingly worked thanks to the daemon). I'm sure that it is some speciffic setting in manjaro kernel config. But still vanilla and MuQSS work (as well as BFS did), so i'm trying to find the cause. PDS shouldn't behave this way under any circumstances so it is a bug. I tried to tailor in ubuntu setting into manjaro kernel the first time I reported a problem, the behavior changed a bit but nothing conclusive. This last report was to let you know that it probably has something to do with process priorities, which might help in finding the cause.
      BR, Dzon.

      Delete
    7. @Dzon
      When you are at the issued status. There are two ways to get to a good status. One is "minus", isolating software and reduce the kernel config. Some how, I believe you can have a good PDS kernel working for your tasks like most other did. In this way, we can find what's the cause of your issue, and if it can be reproduced on my site, it should be easy to fix. Another way is "plus", which add something to make it right from wrong, lucky you have found one. But it is hard to reproduce in my site as I don't have your initial issued status. Yes, we know it maybe priority related, but still, it's not enough information to look at it. I still need more information.
      May I ask what's your annicy profile setting for the kernel compilation setting when it works for PDS scheduler? Can I have a sample of what do this profile looks like? And I think I almost need to know how ananicy works too, I will see if I can learn it by search its source code.

      Delete
    8. @Alfred,
      Thank you for looking at this rare problem (only one affected). Yes, the priority is some indication but doesnt tell us much. I know that you need more speciffic causes. I already tried adapting parts of ubuntu config, but it was very timeconsuming and I didn't find anything, but didnt pull it through. Can't copy the config 1:1, the system didn't boot. Maybe I will try this again on my Ryzen system (more power). I'll check how the priorities get changed by ananicy on my work computer and let you know. Ananicy has a lot of profiles, most of them don't get used. I actually only made one cutom for pulse-effects.
      BR, Dzon.

      Delete
  2. Another update: I have started working on 4.19 sync-up works. First booted 4.19-rc5 with PDS is working.

    ReplyDelete
    Replies
    1. Nice Alfred; are those patches available somewhere for those of us also wanting to use PDS on the latest rc? :-)

      Delete
  3. Working fine here on i7@work, have not had a time to test this on Ryzen@home, but will do.

    Those who use full tickless, haven't You noticed task accounting issues again? Somehow I had to run parallel things on i7 (that was on PDS 098 latwst version) and cpu utilization shows 100% usage, but individual task cpu usage sum is nowhere near the total. Maybe I had weird config or task. I'll recheck this at some time in the future.

    BR, Eduardo

    ReplyDelete
  4. Hi Alfred,

    could you please implement working functionality for corefreq utility (https://github.com/cyring/CoreFreq)? Seems, that pds not exporting the necessary symbols (as the normal scheduler does), so the kernel module couldn't be compiled.

    Thanks.
    Regards sysitos

    ReplyDelete
    Replies
    1. @sysitos, Sorry for the late reply. Was on vocation. What APIs it needs? If they are reasonable, at least a dummy version can be provided.

      Delete
    2. Hi Alfred,

      no problem, hope that was a nice vacation.
      Sorry, couldn't say, what exactly is needed, I'm not good in programming :(
      Found only some google comments, where it was mentioned.
      Maybe you could be so kind and have a quick look at it?
      I verified here only, that with PDS enable it doesn't work, but without it (that means with standard scheduler), it does work.

      Thanks and Regards.
      sysitos

      Delete
    3. Hi Alfred,

      this is the error message from dkms (hand translated into english ;) )
      /var/lib/dkms/corefreq-git/1.35.1.r536.g8ef56b5/build/corefreqk.c: in function »Sys_DumpTask«:
      /var/lib/dkms/corefreq-git/1.35.1.r536.g8ef56b5/build/corefreqk.c:4354:43: error: »struct task_struct« no such element with name »se«
      SysGate->taskList[cnt].runtime = thread->se.sum_exec_runtime;

      Maybe it helps.

      Regards sysitos

      Delete
    4. @sysitos
      "se" stands for scheduler entry, which is used in mainline run queue data structure. PDS doesn't use this data structure at all. So, it's no way to support it. Sorry about that.

      Delete
    5. Hi Alfred,

      thanks for the info. CoreFreq is a nice Intel CPU monitoring software with the possibility to modify some specs (like turbo boost values). So it was a "workaround" before I must disassembly my really bad thermal designed Acer notebook again to renew fans and thermal paste.

      Regards sysitos

      Delete
  5. I‘d like to have a little survey here.
    Do you know about ISO scheduler policy and how often you use it?

    ReplyDelete
    Replies
    1. I know that PDS implements it, but CFS doesn't, and I haven't used it.

      Delete
    2. I use ISO scheduler policy when gaming under Wine. Seems to be a tad smoother tbh :)

      Delete
    3. @Sveinar
      ISO has higher priority than NORMAL policy tasks, that explains that tad smoothness. But in my point of view, it introduces special cases into the scheduler code. So, I am going to remove ISO support, maybe you can try with nice --20 for wine in next PDS kernel release cycle.

      Delete
    4. Hmm.. And i have just gotten used to using this, as doing "nice" for wine is a pita.. spawning various wineserver crap and all kinds of *hit.

      Launching the windows.exe through wine with 'schedtool -I -e wine windows.exe' is a LOT easier than finding a bunch of PID's every time i start something wine'ish.

      There is a "STAGING_RT_PRIORITY_SERVER=" env that kind of does this, but imo with schedtool this was so much easier.

      But.. your source :) Cant force you.. sad to loose this tho, and for all i know it could be just a placebo effect, but started using it when i used a kernel with MuQSS scheduler, and was pleasantly surprised when PDS also supported this.

      PDS is in MY oppinion smoother than MuQSS when doing some wine-gaming :)

      PS. Now that "Proton" (custom wine i gather) from Steam is full-on-steaming to enable windows gaming through Steam.. i would suspect "gaming kernels" that improve wine performance will become even more popular.

      Delete
    5. @Sveinar
      I'd like to share my further though here.

      I want to remove ISO b/c it introduce its design is not so good and a lot branch in scheduler code.

      I do believe as normal users, sometime they just want some thing superior than NORMAL policy to be easier used. It doesn't means it is necessarily be "RT" policy like current ISO is.

      So, I will bring back new "ISO", but I haven't decide what it will be. Maybe I should keep current ISO till I have the replacement, that will be friendly for users like you.

      Delete
    6. The idea with "ISO" was to be used for a regular user without special permissions and less "dangerous" than "nice" or "rt" levels if i understood it somewhat correct.

      Using nice or wine's "STAGING_RT_PRIORITY_SERVER" requires the user to have permission for this. Probably not a huge issue for a single home-user, but still opens possibilities you might not want.(Requires limits.conf edit to allow user to set this).

      Using nice as i said above is not really intuitive when it comes to processes that spawns sub-processes and loads of other libraries. And starting something with -20 nice-level requires sudo command, and i would absolutely not start the app/game/wine as sudo AT ALL. So from a user-friendly pov, id say "schedtool" w/sched_iso is easy :)

      There is also a "gamemode" daemon ppl use for gaming : https://github.com/FeralInteractive/gamemode that utilizes sched_iso. You could try asking ppl there, as i think this requires sched_iso...
      --
      GameMode can leverage support for soft real time mode if the running kernel supports SCHED_ISO. This adjusts the scheduling of the game to real time without sacrificing system stability by starving other processes.
      --

      Delete
    7. @Sveinar

      I used something like this for runnig process with higher priority and still only with user privileges:

      sudo nice --20 sudo -u $USER $CMD

      Its dirty and you still use superuser, but it does the trick and the nice-d process runs as user. Actually, the daemon (ananicy) I mentioned earlier might do what you are looking for without side effects.

      Dzon.

      Delete
    8. @Anonymous
      That did not really work that well with wine. I use a shell script that sets loads of options with "export" (eg. export WINEPREFIX="). Now, ofc i could make it a "oneliner", so that it would keep within its "sudo -u $USER" environment, but tbh, to have a 1-line thing to set 11-12 env variables just to be able to do this = crapload of trouble vs. current sched_iso. Probably doable with the right script im sure, but beyond me.

      I also believe whatever script(s) steam is using also would kinda fail using this sudo method.

      I will look into ananicy daemon. Probably something similar to the "gamemode daemon" i mentioned too :)

      Delete
  6. I know a little and almost never use it (and that's why I don't know it). There were times when I tried to use it to maximize performance of wine, but outside that - no need for me.
    BR, Eduardo

    ReplyDelete
  7. I asked b/c I am thinking about the policy support in PDS. ISO is legacy code from BFS, I am considering removing it.

    ReplyDelete
  8. Hi,
    On my AMD Turion 2 laptop I've microstuttering playing 1080p video with MPV.
    With the standard ubuntu kernel I don't have them adn the video playing is perfectly fluid.
    Now, in order to be 100% that it is PDS related, I am compiling the kernel with the same config of original kernel with the only difference PDS activated.
    I'll be back with more details.

    ReplyDelete
    Replies
    1. Hi,
      I've done a lot of tests and the issue is NOT related to PDS!!!!
      The DE was guilty, even if I disabled the compositor, moving to another DE the issue disappeared.
      Many thanks for developing this great scheduler

      Delete
    2. @Alessio G
      Thanks for the effort for reporting and testing anyway.

      Delete