Thursday, June 6, 2019

BMQ 0.96 release

BMQ 0.96 is released with the following changes

1. Several enhancement in the core path.
2. Redo the sync commit reverted in 0.95.

Enjoy BMQ 0.96 for v5.1 kernel, :)

Full kernel tree repository can be found at https://gitlab.com/alfredchen/linux-bmq
And all-in-one patch can be found at gitlab.

Please report bugs at https://gitlab.com/alfredchen/bmq/issues.

18 comments:

  1. @Alfred:
    Sorry, for still reporting this here, please don't mind.

    Something's wrong with this release:
    * It hardlocks somewhere in kernel bootup. Sometimes earlier, sometimes later.
    * Reverting 17f75c37 (the older problematic sync-up commit) may get it further, but hardlocks later, too. E.g. when compiling the virtualbox modules in tty1 after succeeded boot.

    Maybe someone of you can pin-point this issue to one of the most recent commits, as I don't have time to bisect it atm.

    TIA and best regards,
    Manuel

    ReplyDelete
    Replies
    1. +1, it locked up on the very first boot for me. Now trying to figure it out on a VM.

      Delete
    2. em... All commits has been tested for ~2 weeks on my machines. So, it has depends on bisect to find out which new commit cause the issue.

      Delete
    3. So far, no luck with reproducing it ☹. I tried multiple VM reboots + burn_scheduler stuff w/o any hitch.

      Delete
    4. @pf If it locks up on very first boot, you can use bisect to test which commit cause the issue. There is just 5 commit from 0.95 to 0.96.

      Delete
    5. I can, but the very first boot was the only one, and after that it boots fine, thus I wrote that I couldn't reproduce it.

      Delete
    6. Just got a lockup during usual workflow, but netconsole didn't catch it. Hm ☹.

      Delete
    7. Yeah, o.k. guys, I take the time needed during the coming hours, to do a "pseudo" bisect (only adding/removing the relevant 5 commits) to my local kernel source (lack of disk space again for a "true" bisect) and report back when I see a result.

      First commit for 096 a6ed4c54 works well atm.

      Unfortunately, I also got a lockup this afternoon, but with a 5.1.6 with BMQ095 + redone syncup patch applied. Maybe, it's still not as safe as wished.

      I hope the "gcc (SUSE Linux) 9.1.1 20190527 [gcc-9-branch revision 271644]" is not known for issues!?

      Manuel

      Delete
    8. @Alfred and @Oleksandr:
      This was a short trip. After adding the second commit, problems began to arise:
      * Eventually not getting through whole kernel + system bootup upto KDE
      * lockup at compiling virtualbox modules in tty1 after successful bootup
      (3 = three trials, none working)

      The commit in question is at least:
      ff3d9f84 "bmq: Introduce bmq_find_first_bit()/bmq_find_next_bit() macros."

      Please have a look, TIA and best regards,
      Manuel

      Delete
    9. Faced this for the 3rd time, again w/o netconsole output. I'll leave it burning over night in a VM hoping for a serial console printout.

      Delete
    10. I've managed to knock out the VM with my burn_scheduler stuff, but even then there's no output through the serial console.

      Delete
    11. Pls try debug patch https://gitlab.com/alfredchen/bmq/raw/master/5.1/5.1_bmq096_debug.patch upon BMQ 0.96 and see if it helps.

      Delete
    12. @Alfred:
      Your FIX does work. With my above mentioned reprodusction steps, now, all went fine. System is up for only some minutes now, but looks o.k. I hope it keeps this shape for longer...

      Thank you for your quick work!

      Manuel

      Delete
    13. @Alfred and @Oleksandr:
      It, additionally, already survived one TuxOnIce (most recent code) hibernation, too.
      So I'd say it's safe to give BMQ 0.96 a try _with_ the debug patch.

      Best regards to both of you,
      Manuel

      Delete
    14. So far so good for me as well with that additional patch. Thanks.

      Delete
    15. I can also only report good news since yesterday. It kept it's good shape: All fine in my use-cases.

      BTW, CPU utilisation seems to have changed to the better with (fixed) 0.96 vs. 0.95. Less overhead? Maybe a subjective impression only.

      Manuel

      Delete
    16. It's caused by the x86 ffs() implementation. There is a comment for the return value of the function.
      * Undefined if no bit exists, so code should check against 0 first.

      Delete
    17. I have pushed the official fix at https://gitlab.com/alfredchen/linux-bmq/commit/3606d92b4e7dd913f485fb3b5ed6c641dcdeb838

      Recently, there is no fancy idea to try on BMQ, but code level improvement based on current design. In sanity tests, slight improvement is recorded. Hopefully you can notice this changes.

      Delete