Friday, September 25, 2015

Consider cache in task scheduling Part 2

In this part, let's look at the first factor of the task caching -- the cpu cache size. Talking about cache, there is a fighting between Intel and AMD about the cache size in their cpu design years ago. Intel tends to have large cpu cache size while AMD uses less. I remembered one of AMD's explain is software for gaming doesn't use large cache size. I'm kind of agree that.

IMO, cpu cache size, especially the LLC(Last Level Cache) size determined the hardware capacity of how many data can be cached for cpu. And look at this in another way, giving a sequence of tasks switching, the cpu cache size determined how long the data of a task can be kept in cache. For system with large workload, large number of tasks are running at the same time, cpu with larger cache size will help for keeping task's data in cache than cpu with less cache size. For system workload for short response time, like gaming, large cache size doesn't help much. So, AMD is right for this, but large cache size design is good for common workload usage, not just for the workload like gaming.

Task scheduling should take cpu cache size(llc size) into account. In the latest 4.2 vrq branch, there is a new commit implements the first version of code change, which aware of llc cache size of cpu and auto adjust the cache_scost_threshold(cache switch cost threshold) by it. (For the concept of cache switch cost threshold, please reference to Part 1). Here is the brief of what it has been done.

1. export a scheduler interface to cacheinfo(drivers/base/cacheinfo.c) as a call back once cacheinfo is ready.
2. implement a simple linear formula to auto cache_scost_threshold adjustment.
  • Now this formula is considered based on intel core2 cpu topology and 64bit kernel. Every 512KB LLC size increase CACHE_SCOST_THRESHOLD value by 3.
  • For 32bit kernel, consider 32bit use less data/instruction size, every 256KB LLC size increase CACHE_SCOST_THRESHOLD value by2, but I don't have 32bit systems to prove this yet.
  • The new cpu topology like smt is not yet be considered in the formula, because there are bounces in the benchmark result when SMT enable, so it's hard to compare the result of different CACHE_SCOST_THRESHOLD value.
3. A kernel boot parameter "cache_scost_threshold" is introduced, it can be used for manually assigning the CACHE_SCOST_THRESHOLD value to scheduler if
  • cacheinfo is not available in your arch(like arm, most?) or
  • the simple linear formula is not covering your cpu topology and you want to find out the best value of it.
It's still at its first version, in next version, I'd like to complete the formula to cover the SMT topology.

... to be continued (Part 3)

17 comments:

  1. Thank you very much for your explanations about what happens under the cover of -vrq code.

    The CACHE_SCOST_THRESHOLD value leads me to the following question: How would you describe the expected behaviour of increasing or decreasing this value from your programmer's point of view (plus maybe from your testing experience)? Especially, would it -- and if, how -- affect performance/ throughput/ interactivity?

    Thank you in advance and best regards,
    Manuel Krause

    ReplyDelete
    Replies
    1. CACHE_SCOST_THRESHOLD value should just focus on performance and throughput, and measure by benchmark/testing etc. Don't take interactivity into account, I'd like to take care interactivity for different kind of tasks and kernel config(system intention) in part 3.

      Delete
    2. It's a very sensitive setting, I need to say, also for other sub-systems. Going to 12 from default [18] made TOI non-working. Lowering the value seemed to make the system "subjectively" faster and more interactive. Last known good was with 16 manually set by kernel command line.

      Manuel

      Delete
    3. @Manuel
      I think you don't need to manually set this boot parameter as your cpu topology is covered by current formula. Normally I'd like out-of-box design and don't like to expose too many configurable things, but the cacheinfo is not well support among different arch.

      Delete
    4. Your blogspot interface is failing again. If the next posting comes twice, please, delete the first. Manuel Krause

      Delete
    5. Of course, I didn't have the need to manually change it. Personally, I don't like black-box systems and I appreciate the possibility you've introduced, to play a bit with it. At least, noone of us others knows how you've generated the factor(s) of the CACHE_SCOST_THRESHOLD formula.
      But, in the end, it turns out, that your automatism was wisely chosen, with regard to BOTH stability and performance.
      Really good work and I'm looking forward to your next chapter (Part 3).

      Kind regards,
      Manuel

      Delete
    6. This --my last-- statement ONLY applies to MY one system, Core2 Duo Intel dual core cpu with integrated gfx, no HT, no SMT+ (and also not enabled in kernel .config).

      So, you others are still encouraged to test current values for your system. This means: WE NEED MORE TESTERS !!!

      Best regards,
      Manuel

      Delete
  2. Hi,
    I have i7-5960X and would like to do some benchmarking with the patch.
    What exactly are you using to test performance ?

    Jan

    ReplyDelete
    Replies
    1. Sorry for late reply, was on vocation. Currently I am just use my simple sanity test script to do kernel compile in different workload. I have uploaded to https://bitbucket.org/alfredchen/linux-gc/downloads/sanity and https://bitbucket.org/alfredchen/linux-gc/downloads/compile_throughput

      If you want to use the scripts, you may need to set up you test kernel source tree, put your test kernel config there and modify the script for the path of the kernel source tree.

      And PS, to minimize the impact of other activities in the system, I used to boot into console mode and run the sanity tests.

      Wish this info is helpful for you.

      BR Alfred

      Delete
  3. Hi Alfred,
    sooo much time with your patches without any issues -- that's a really nice experience! :-)))

    Today I tried compiling a 4.2.4 patched kernel and get the following errors (sorry for the possibly mangled whitespaces):
    ...
    CC kernel/sched/bfs.o
    CC arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.o
    kernel/sched/bfs.c: In function ‘_cond_resched’:
    kernel/sched/bfs.c:5272:6: error: too few arguments to function ‘should_resched’
    if (should_resched()) {
    ^
    In file included from include/linux/preempt.h:64:0,
    from include/linux/spinlock.h:50,
    from include/linux/mmzone.h:7,
    from include/linux/gfp.h:5,
    from include/linux/mm.h:9,
    from kernel/sched/bfs.c:31:
    ./arch/x86/include/asm/preempt.h:93:29: note: declared here
    static __always_inline bool should_resched(int preempt_offset)
    ^
    kernel/sched/bfs.c: In function ‘__cond_resched_lock’:
    kernel/sched/bfs.c:5290:16: error: too few arguments to function ‘should_resched’
    int resched = should_resched();
    ^
    In file included from include/linux/preempt.h:64:0,
    from include/linux/spinlock.h:50,
    from include/linux/mmzone.h:7,
    from include/linux/gfp.h:5,
    from include/linux/mm.h:9,
    from kernel/sched/bfs.c:31:
    ./arch/x86/include/asm/preempt.h:93:29: note: declared here
    static __always_inline bool should_resched(int preempt_offset)
    ^
    kernel/sched/bfs.c: In function ‘__cond_resched_softirq’:
    kernel/sched/bfs.c:5312:6: error: too few arguments to function ‘should_resched’
    if (should_resched()) {
    ^
    In file included from include/linux/preempt.h:64:0,
    from include/linux/spinlock.h:50,
    from include/linux/mmzone.h:7,
    from include/linux/gfp.h:5,
    from include/linux/mm.h:9,
    from kernel/sched/bfs.c:31:
    ./arch/x86/include/asm/preempt.h:93:29: note: declared here
    static __always_inline bool should_resched(int preempt_offset)
    ^
    make[2]: *** [kernel/sched/bfs.o] Error 1
    make[1]: *** [kernel/sched] Error 2
    make: *** [kernel] Error 2
    make: *** Waiting for unfinished jobs....

    I hope you can answer shortly on how to fix it and/ or provide a patch to solve the issue.

    Best regards,
    Manuel Krause

    ReplyDelete
    Replies
    1. By now, I've investigated a little around the related commit (https://github.com/torvalds/linux/commit/fe32d3cd5e8eb0f82e459763374aa80797023403) and adapted the changes to kernel/sched/core.c
      to apply to kernel/sched/bfs.c --
      and uploaded the resulting patch to: http://pastebin.com/Za9cUggs

      I hope this is sufficient to fix the issue. Please, report back, if not !
      (It's compiling atm, and I'd come back if it makes problems.)

      BR, Manuel Krause

      Delete
    2. Thanks Manuel for the head-up. You are looking at the right commit which introduced the compile error, your patch should works thought I can't access it for some reason.

      There are a few commits related schedule code till v4.2.4, I will find some time to sync-up these changes to -gc and -vrq for 4.2 this weekend.

      BR Alfred

      Delete
    3. I've only little uptime with it, but everything seems to work as fine as before (including TuxOnIce hibernation).
      Don't see why you can't access the pastebin.com. (I still can.) I've now uploaded it to:
      http://paste.opensuse.org/92475187
      Hopefully, that's better accessible.

      And, yes of course, there's no miracle in the patch, it's just a simple adaption.

      BR Manuel

      Delete
    4. I'd consider the posted preliminary patch as stable -- after more uptime + more hibernations.
      BR, Manuel

      Delete
  4. BTW, didn't you want to describe "Consider cache in task scheduling" some more? Part 3 is missing for some weeks now.

    BR, Manuel Krause

    ReplyDelete
    Replies
    1. For the part 3, I am not even start writing the code, just have a bare idea in mind. It's too late to start it in this release(I'm busy with other stuff), hopefully it fits in next release.

      Delete
    2. O.k. -- I'll try to be patient ;-) -- just was too curious and I am looking forward to the progress.
      Many thanks and kind regards, as always,

      Manuel Krause

      Delete