Sunday, June 28, 2015

Time to have fun with kernel 4.1

Just pushed my BFS0462 port and -gc bfs enhancement patches for kernel 4.1. There is no new features in -gc branch but some bug fixes and sync up changes with mainline kernel.

Nothing remarkable items, I have put all of them in commits if I remember correctly.

Pls check it from bitbucket or github

PS, recently I got a google chromebook pixel(2013), I could have some test with SMT after I set up the system on it.

BR Alfred Chen

Edit: We found an issue that UP is broken in BFS since kernel 3.18, investigation is going on but I put it in low priority than the kernel 4.1 -vrq branch release.

Edit(Jul 17): Update -gc branch to rebase kernel v4.1.2 and fix compile error when enable some kernel hack config.
9654667 bfs: [Fix] Fix undeclared sched_domains_mutex. 
b6e4eaf bfs: [Fix] Fix wrong rcu_dereference_check() usage. 
I have done a force update on the linux-4.1.y-gc branch, so if you have fetched it before, please delete the remote branch in your git and re-fetched it again.

25 comments:

  1. I've ported BFS as well a bit earlier, and it seems that everything is OK. However I've updated pf-kernel tree against your changes, and here are some small differences:

    https://github.com/pfactum/pf-kernel/commit/34b9112f8ca9175ed714465fd1b6495ddebec5c9

    Thanks for your work!

    ReplyDelete
    Replies
    1. I have checked the differences, all are expected. :)

      Delete
  2. I'm running this 4.1-gc kernel for some hours now. I know that 22h plus only 4 suspends to disk are not a sufficient testing. But it seems to work very well so far.

    Additionally to -gc I have applied
    * BFQ for 4.0.0 without any modifications
    * Tuxonice for 4.1.0-rc8 (only 1 hunk needed to modify)
    * my usual patches to get my laptop fan working
    * Alfred's "old" patches for cpu optimisations, XOR templates, fast strings

    Many thanks for your great work,
    best regards,
    Manuel Krause

    ReplyDelete
    Replies
    1. Thanks for testing. I have push addtional patches to -gc branch and still waiting for the new bfq release.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Sorry...it was showing double posts after the 'Publish' so I tried to delete it.

      Delete
  4. >We found an issue that UP is broken in BFS since kernel 3.18, investigation is going on but I put it in low priority than the kernel 4.1 -vrq branch release.

    Thanks Alfred; looking forward to the UP panic getting fixed, so I can remove the SMP workaround from my kernel config!

    ReplyDelete
  5. Hi Alfred,

    thanks a lot for your hard work !


    Unfortunately it fails with GCC 5.1:

    *kernel/sched/bfs.c:687:33: error: implicit declaration of function ‘cpu_sibling_mask’ [-Werror=implicit-function-declaration]
    * cpumask_and(res_mask, cpumask, cpu_sibling_mask(cpu))
    * ^


    *kernel/sched/bfs.c:709:33: error: implicit declaration of function ‘cpu_core_mask’ [-Werror=implicit-function-declaration]
    * cpumask_and(res_mask, cpumask, cpu_core_mask(cpu)) ||
    * ^



    * CC kernel/irq/dummychip.o
    * CC arch/x86/kernel/process.o
    * CC fs/btrfs/root-tree.o
    * CC mm/migrate.o
    * CC mm/huge_memory.o
    * CC mm/memory-failure.o
    *--
    * CC kernel/sched/completion.o
    * CC kernel/sched/idle.o
    * CC kernel/sched/cpupri.o
    * CC arch/x86/kernel/check.o
    *kernel/sched/bfs.c: In function ‘llc_cpu_check’:
    *kernel/sched/bfs.c:687:33: error: implicit declaration of function ‘cpu_sibling_mask’ [-Werror=implicit-function-declaration]
    * cpumask_and(res_mask, cpumask, cpu_sibling_mask(cpu))
    * ^
    *kernel/sched/bfs.c:687:33: warning: passing argument 3 of ‘cpumask_and’ makes pointer from integer without a cast [-Wint-conversion]
    *--
    * from kernel/sched/bfs.c:31:
    *include/linux/cpumask.h:351:19: note: expected ‘const struct cpumask *’ but argument is of type ‘int’
    * static inline int cpumask_and(struct cpumask *dstp,
    * ^
    *kernel/sched/bfs.c: In function ‘nonllc_cpu_check’:
    *kernel/sched/bfs.c:709:33: error: implicit declaration of function ‘cpu_core_mask’ [-Werror=implicit-function-declaration]
    * cpumask_and(res_mask, cpumask, cpu_core_mask(cpu)) ||
    * ^
    *kernel/sched/bfs.c:709:33: warning: passing argument 3 of ‘cpumask_and’ makes pointer from integer without a cast [-Wint-conversion]
    *--
    * from kernel/sched/bfs.c:31:
    *include/linux/cpumask.h:351:19: note: expected ‘const struct cpumask *’ but argument is of type ‘int’
    * static inline int cpumask_and(struct cpumask *dstp,
    * ^
    *kernel/sched/bfs.c: In function ‘thread_cpumask’:
    *kernel/sched/bfs.c:6930:9: error: implicit declaration of function ‘topology_thread_cpumask’ [-Werror=implicit-function-declaration]
    * return topology_thread_cpumask(cpu);
    * ^
    *kernel/sched/bfs.c:6930:9: warning: return makes pointer from integer without a cast [-Wint-conversion]
    *--
    * CC arch/x86/kernel/cpu/perf_event_intel.o
    * CC security/apparmor/domain.o
    * CC security/apparmor/policy.o
    * CC security/apparmor/policy_unpack.o
    *arch/x86/kernel/cpu/perf_event_intel.c: In function ‘intel_pmu_cpu_starting’:
    *arch/x86/kernel/cpu/perf_event_intel.c:2632:7: warning: unused variable ‘h’ [-Wunused-variable]
    *--
    * CC arch/x86/kernel/amd_gart_64.o
    * CC arch/x86/kernel/aperture_64.o
    * CC arch/x86/kernel/cpu/perf_event_intel_cqm.o
    * CC arch/x86/kernel/cpu/perf_event_intel_pt.o
    * CC arch/x86/kernel/cpu/perf_event_intel_bts.o
    *cc1: some warnings being treated as errors
    * CC arch/x86/kernel/cpu/perf_event_intel_uncore.o
    *scripts/Makefile.build:258: recipe for target 'kernel/sched/bfs.o' failed
    *make[2]: *** [kernel/sched/bfs.o] Error 1
    *scripts/Makefile.build:403: recipe for target 'kernel/sched' failed
    *make[1]: *** [kernel/sched] Error 2
    *Makefile:946: recipe for target 'kernel' failed
    *make: *** [kernel] Error 2

    ReplyDelete
    Replies
    1. Have you confirm that this happens only in GCC 5.1? Does elder gcc version works for you?
      I use gcc 4.8.x and now 4.9.2, never has such compile issue with these two functions.

      BR Alfred

      Delete
    2. Hi Alfred,


      (not sure why but either my comment is awaiting moderation or the browser or the site ate my reply)


      I took a deeper look at the kernel and realized that I had forgotten that it included scheduler changes for 4.2 - therefore the compilation issue.

      Starting with a new base from scratch - it compiled fine with GCC 5.1 and is running great so far

      (I did a few rounds of Mass Effect 3 Multiplayer with WINE staging in 1920x1080 - it really has come a long way, the BFS improvements clearly help)

      So please ignore that false alarm =)

      Delete
    3. @kernelOfTruth
      Got your reply.

      Delete
    4. There's an issue with current BFS and 4.1.

      Today I fired up a

      btrfs scrub start /

      and it hardlocked


      a few days back it did run for a few seconds and then also hardlocked when requesting the status of the running scrub:

      btrfs scrub status /bak


      This has been an issue a few kernel releases back with BFS when some small bugs had to be fixed (e.g. my incomplete port to a newer kernel - which in total ran fine but showed also an hardlock when

      firing up a

      btrfs scrub start)

      Anyone else experiencing this ?


      Anyway - I'm back to a kernel with CFS for now - not much time for testing out new stuff

      stability is priority no.1 for now (also lost enough time [2 days] with troubleshooting QT & KDE-related non-working desktop :/ )


      Hope the info helps to track this down

      Delete
    5. Seems like I'm not the only one:

      http://ck-hack.blogspot.com/2015/04/bfs-462-linux-40-ck1.html?showComment=1432135190412#c8301870429764130044

      http://ck-hack.blogspot.com/2015/04/bfs-462-linux-40-ck1.html?showComment=1436327448100#c8013470520406022151

      Alexander has the exact same issue: when running Btrfs scrub - it crashes

      Delete
    6. What's the code base of bfs when you have the btrfs scrub issue? Pure BFS or with some of the -gc commits.
      When I first know this btrfs issue in ck's blog at about 3.18 or 3.19 time frame, I have tested in my machines but not reproducible.

      Delete
    7. linux-4.1.y-gc from https://github.com/cchalpha/linux-gc/commits/linux-4.1.y-gc

      up to commit https://github.com/cchalpha/linux-gc/commit/cd356bf85dbdba7ba7066e20ebc2adc51d38155e was being used

      Btrfs changes are always latest integration or for-linus branches merged against stable from http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git

      Delete
    8. Updates: unable to reproduce with a btrfs usb partition, so, what's your btrfs setup?

      Delete
    9. cryptsetup with aes size of 512,

      cryptsetup -y --cipher aes-xts-benbi:sha256 --key-size 512

      for / (root), /usr/portage and /bak

      after opening the luks Container

      the partitions are mounted with noatime,nodiratime,compress=lzo

      The hardlock hardly occurs when run on / (root) [35 GB size, 19 GB used]
      or /usr/portage [9.8 GB size, 4.7 GB used]
      both are on an SSD

      but pretty instantly and reliably on /bak [3 TB size, 1.9 TB used]

      Delete
    10. This was a rather "trivial" fix:

      The fix for the hardlock from upstream (ck) was removed by the following commit: https://github.com/cchalpha/linux-gc/commit/911bac7b2fcd8a7ec9d1b82109e77d89cb025c24

      re-adding it and btrfs scrub so far has survived scanning 40 GB of data =)

      https://github.com/kernelOfTruth/linux/commit/a9efc3e88854732b724f99767f525a9849beb274

      Delete
    11. Lol - that was too easy ;)

      the change made the system more resilient to the load but it wasn't enough ...

      after roughly an hour it slowly hardlocked (while playing back music from youtube + browsing on github)

      the music snippet kept repeating and then eventually music stopped.

      The system didn't respond to Magic SYSRQ key :/

      Delete
    12. @kernelOfTruth
      I have test btrfs scrub on my productive machine, 187G Size 83G used raid0 setup btrfs and all other partitions, all btrfs scrub run fine w/o deadlock.

      So I'll suggest you to enable below kernel hacking config and see if there are any useful log can be captured in dmesg when deadlock happens.

      CONFIG_SCHED_DEBUG
      CONFIG_SCHEDSTATS
      CONFIG_SCHED_STACK_END_CHECK
      CONFIG_TIMER_STATS
      CONFIG_PROVE_LOCKING
      CONFIG_LOCK_STAT
      CONFIG_DEBUG_LOCKDEP

      Remember to use the latest -gc branch code there are 2 fixes when you enable these configs.

      Delete
  6. Hi, Alfred,
    regarding your EDIT from July 17th, I want to thank you very much for your continued in depth care for the BFS patches + enhancing them!
    As I think that I don't get it correctly, can you please clarify: Do the newly added fixes only fix compile-time errors or also errors in the BFS?

    So far, I can say, it's working well, and, yes, again: Thank you!

    Manuel Krause

    ReplyDelete
    Replies
    1. I am reworking -vrq branch and enable some kernel hack config to help. During this, I got some issues when enable kernel hack configs, some these two commits are the fixes.

      Delete
    2. Oh, o.k., fine!
      I'm looking forward to the 4.1-vrq. And I hope to find more time to dig into systemd's /SuSE internals to safely disable and reenable the failing services+subprocesses on my running system. Still hoping I won't ever need it with your new release to come. ;-)

      Manuel

      Delete
  7. @Manuel Krause:

    Looks like a rebase to me - so no functional changes or new patches

    ReplyDelete
    Replies
    1. No, kernelOfTruth, according to the edited message on top, at least two new patches have been added:
      https://bitbucket.org/alfredchen/linux-gc/commits/96546670bc617a0d84b78664e9d3baf0f0c00de3?at=linux-4.1.y-gc
      and
      https://bitbucket.org/alfredchen/linux-gc/commits/b6e4eafcbc1bf5754d5703f80b76d22d53d6b3e6?at=linux-4.1.y-gc

      So, we'd better wait for Alfred's answer.

      Manuel Krause

      Delete