Wednesday, August 22, 2018

PDS 0.98w release

PDS 0.98w is released with the following changes

1. Rework rq->sl_header data structure. Now, it is a pointer to the idle->sl_node. This unify sl_node -> task conversion.
2. Policy based task_preemptible_rq(). IDLE/BATCH/NORMAL tasks now has their own strategy in the ttwu code path.
3. Rewrok take_other_rq_task() code path. This code change is prepare for the incoming SM_NICE feature.
4. Try to take tasks from other rq when prio downgrade. Now when rq is going to switch to a low priority task than previous, it will look up if there is higher priority tasks from other run queues.
5. Introduce sched_rq_pending_masks. This replace the sched_rq_nr_running_masks, and provide better filter mask for take_other_rq_task code path.
5. Change per cpu cpu_has_smt_sibling to sched_sibling_cpu. Another preparation for SMT_NICE.
6. Fix UP/non-SMT compilation warning.

#1 and #2 are the new code changes. I personally like the #1, which actually implement the idle task as the tail of the task link in the rq, just like the old school book said. Other changes are preparation for SMT_NICE in the next release.

Enjoy PDS 0.98w for v4.18 kernel, :)

Code are available at https://gitlab.com/alfredchen/linux-pds.
All-in-one patch is available too.

40 comments:

  1. Compiles, boots ok on Ryzen. Couple of games run smooth as far as testing goes (couple of mins).

    Br, Eduardo

    ReplyDelete
  2. Spoke too soon: https://pastebin.com/jkbJBXZ7
    This is with nohz_full.

    Br, Eduardo

    ReplyDelete
    Replies
    1. I recompiled the kernel once more, previously probably my patching job went wrong, I'll test again in couple of days. Will report back later.

      Br, Eduardo

      Delete
    2. Any feelback will be welcome.

      Delete
    3. This issue went away, after corrected patching, so far it's good. Sorry for noise.
      Br, Eduardo

      Delete
  3. Builds/boots fine here, also with cgroups patch applied. Thanks.

    ReplyDelete
  4. Thanks Alfred.
    See the usual throughput tests here :
    https://docs.google.com/spreadsheets/d/163U3H-gnVeGopMrHiJLeEY1b7XlvND2yoceKbOvQRm4/edit?usp=sharing

    PDS 0.98w is faster than CFS on average, and always within 1% of CFS.
    Also, I've switched to NO_HZ_FULL.

    Pedro

    ReplyDelete
    Replies
    1. Thanks for the tests. I'd like to suggest you to run tests in BATCH and IDLE policy. It will be a good base line reference for the incoming SMT_NICE feature.

      Delete
    2. Ok will do. Just to be sure, BATCH is 'schedtool -B' and IDLE 'schedtool -D' right ?

      Pedro

      Delete
  5. Hi Alfred,
    huge problems here. System does completely freeze after login (Acer Laptop, KDE with Arch linux, 64 bit i7-4710HQ). Plasma shell is starting, network manager tries to connect and system stopped. No SysRQ keys or Ctrl+Alt+Del work, no response at all, must hard power off the system. No problems so far with older versions of PDS.
    Tested with pf-linux and with plain linux kernel source and only your full patch applied. Tested with and without noHz. Changing only the scheduler to CFS within config, and all is fine, no problems at all. Haven't seen any problems on journal log before or during freeze.

    On the other side, on my stone old laptop (Pentium 4 mobile with Arch32) the pf-kernel with PDS does run without problems.

    Regards sysitos

    ReplyDelete
    Replies
    1. Hi,
      I have hard freezes as well, nothing in logs, but I can't currently say it's PDS 0.98w fault 100%. I'll test older versions and report back.
      Br, Eduardo

      Delete
    2. Eduardo has confirm no issue after correctly apply the patch above. So the only issue will be sysitos's 4710HQ here.
      @sysitos
      "No problems so far with older versions of PDS." That old version means the 098v? If that is correct, would you please use git bisect to find out which commit in 098w PDS has introduce the issue for your Acer laptop? It only takes 3~4 tries to find out. Find out the dirty commit will give the best hints when there is no kernel log can be captured.

      Delete
    3. Yes, think that version 098v and before were good, using PDS always with the pf-kernel, so until today no separate checkout from your git. Had only double checked with vanilla kernel and your all in one patch to exclude some pf-kernel anomalies. Try to check, how to manage git bisect and will inform you.

      Regards sysitos

      Delete
    4. Thanks, @sysitos. My best guess is there is a special task after login cause the issue, and it's not yet cover by the new code changes in 098w. Will wait for your git bisect result to get enough hints for investigation.

      Delete
    5. @Alfred, I corrected my patching and that issue above went away, BUT hard freezes is another issue I still have.
      Hard freezes for me happens from time to time while gaming, say from hour to two hours gaming results in hard freeze. Not easy for me to replicate the problem. I reverted to 098v, lets see whether it freezes in coming days. Maybe my issue is not related to PDS, I'll report back how 098v fares.
      Br, Eduardo

      Delete
    6. Hi Alfred,
      commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 is the last working commit
      or with git words:
      #git bisect good
      1e060b2ba6b526a7cb63db24c636d8c6a2a5cbda is the first bad commit
      commit 1e060b2ba6b526a7cb63db24c636d8c6a2a5cbda
      Author: Alfred Chen
      Date: Sat Aug 4 08:40:33 2018 +0800
      pds: Fix possible task lost in migrate_tasks().

      Maybe it helps, I could trigger the freeze after a suspend to ram, but sometimes it freeze already after login.

      regards sysitos

      Delete
    7. @sysitos
      Thanks. That's the most unexpected commit among them.
      But it's easy to test, would you pls revert it and see if it helps with the frozen issue?

      Delete
    8. PS, migrate_tasks() only be called in sched_cpu_dying(), which happens only when cpu goes offline. So it can't explain why system freeze after login.
      Pls revert 1e060b2ba6b526a7cb63db24c636d8c6a2a5cbda and test, if it still freeze after login, pls go back to 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 and test more.

      Delete
    9. Hi Alfred,

      tried to revert both parts of the commit (single parts always freeze) but in the end it freeze again (but not so quick). Maybe I must check it stronger and commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 wasn't so safe, as it looked in the first.

      Regards sysitos

      Delete
    10. Hi Alfred,
      some tests later, the 098v versions seems to be fine, no freezes here. (Some error logs: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: ee0000000040110a
      mce: [Hardware Error]: TSC 0 ADDR fef81d40 MISC 38a0000086) But the next commit after this, 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 does already freeze the system (after the second suspend). So hadn't test enough during my previous post, so sorry for the wrong message.

      PS: But a simple git revert doesn't work for undo this commit. Any suggestions?

      Regards sysitos

      Delete
    11. @sysitos
      I need more clarification to isolate the issue here, on commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1, the freezen only happen when suspend? If you keep system runing without suspend, is that any freezen issue?

      Delete
    12. Hi Alfred, with commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 I had exactly 1 freeze, that was after a suspend. No more tests. I only tested to reach that freeze with different versions during git bisect and than checked another commit. With newer commits, the freeze seems to be occur more quicker. Most time, the freeze comes right after login, during network manager tries to connect to my wlan. (had in mind, that this was the cause, but with normal LAN connection there is a freeze too) Sometimes the freeze comes later during normal work (e.h. in firefox or console - no high load). Some freezes come after s2ram, and 1 during s2ram. All are hard freezes, only power button does help. With commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 the working without freeze was longest, so the first time I assumed, that there was no problem with this commit. But I can check if there is a freeze with normal work too on this commit.

      Regards sysitos

      Delete
    13. Hi Alfred,

      quick test with commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1, freeze occurs within approx. 2 minutes after restart (was starting konsole, firefox, kontact). No suspend involved.

      Regards sysitos

      Delete
    14. @sysitos
      Thanks for the quick test. Firstly, put suspend away, it's on other code path and it may also bugged on your machine, let's check it back later. I'll take some time to provide one or more patch upon 1e72b40cf0605fe09aca0442c2644d4b9228ffc1 for your frozen issue.

      Delete
    15. After investigation, I have decided to revert the commit 1e72b40cf0605fe09aca0442c2644d4b9228ffc1. The major problem is, the idle task queue status depends on where there is other task in the rq. I'm not sure if it is the cause of the frozen issue, but this is totally wrong. So, no matter how rare the issue can be triggered, it must be undone.
      I'm preparing the revert patch and make a new PDS release this weekend.
      Thanks all for testing and provide the feedback.

      Delete
    16. Hi Alfred,

      I'm glad, that I could help you a little bit (btw and I have learned a new git command ;) ). Nice to see, that you have identified the problem so quick.

      PS: I prefer your PDS scheduler over CFS, because it's snappier and over MuQSS because it works better with BFQ and its more often updated. (And it's included within pf-kernel, thanks Oleksandr). So a big Thanks from me for your work to you Alfred.

      Regards sysitos

      Delete
    17. @sysitos
      Pls try this revert patch upon PDS098w.
      https://gitlab.com/alfredchen/PDS-mq/raw/master/4.18/v4.18_pds098w_revert_data_structure.patch

      Delete
    18. Hi Alfred,

      patch applied and compiled fine. Tested on 2 kernels, your git tree and linux 4.18.5 with PDS 0.98w and than the revert patch.
      BUT: Booth have kernel oops already during boot up. Or with other words, it's really not better now ;)

      Regards sysitos

      Delete
    19. @sysitos
      Any oops logs!? photo by cellphone also works, :)

      Delete
  6. Hi Alfred,

    have done a photo by cellphone, how to share it?

    Regards sysitos

    ReplyDelete
    Replies
    1. You can send me email cchalpha _at_ gmail dot com

      Delete
  7. I am also having long freezes with my old AMD Turion II based laptop. Waiting for an updated patch without opss in order to test it :)

    ReplyDelete
    Replies
    1. So can you also try the revert patch and see how it works?

      Delete
    2. >BUT: Booth have kernel oops already during boot up. Or with other words, it's really not better now ;)

      same, I also get an error directly at booting up the kernel with the patch applied

      without the patch - PDS 0.98w seems to work fine though

      Delete
    3. @kernelOfTruth
      Have you captured the error log?

      Delete
    4. Yes, not sure if it's useful though, the top scrolled so fast that I couldn't snap it via cam & I'm using the zero-overhead unwinder with low accuracy:

      https://i.imgur.com/dKsQcCg.jpg

      Delete
  8. @all
    Please try the v2 revert patch upon PDS098w.
    https://gitlab.com/alfredchen/PDS-mq/raw/master/4.18/v4.18_pds098w_revert_data_structure_v2.patch

    ReplyDelete
    Replies
    1. Patch applied, compiled and runs fine without any oops, freezes or any other error.
      Thanks for your work.

      Regards sysitos

      Delete
    2. V2 patch applied against PF -kernel 4.18-pf4 without errors.
      It seems to work very well on my Turion 2 too.
      I don't have freezes anymore, many thanks!!

      Delete