Alfred Chen's Blog: Consider cache in task scheduling Part 2

Friday, September 25, 2015

Consider cache in task scheduling Part 2

In this part, let's look at the first factor of the task caching -- the cpu cache size. Talking about cache, there is a fighting between Intel and AMD about the cache size in their cpu design years ago. Intel tends to have large cpu cache size while AMD uses less. I remembered one of AMD's explain is software for gaming doesn't use large cache size. I'm kind of agree that.

IMO, cpu cache size, especially the LLC(Last Level Cache) size determined the hardware capacity of how many data can be cached for cpu. And look at this in another way, giving a sequence of tasks switching, the cpu cache size determined how long the data of a task can be kept in cache. For system with large workload, large number of tasks are running at the same time, cpu with larger cache size will help for keeping task's data in cache than cpu with less cache size. For system workload for short response time, like gaming, large cache size doesn't help much. So, AMD is right for this, but large cache size design is good for common workload usage, not just for the workload like gaming.

Task scheduling should take cpu cache size(llc size) into account. In the latest 4.2 vrq branch, there is a new commit implements the first version of code change, which aware of llc cache size of cpu and auto adjust the cache_scost_threshold(cache switch cost threshold) by it. (For the concept of cache switch cost threshold, please reference to Part 1). Here is the brief of what it has been done.

1. export a scheduler interface to cacheinfo(drivers/base/cacheinfo.c) as a call back once cacheinfo is ready.
2. implement a simple linear formula to auto cache_scost_threshold adjustment.

Now this formula is considered based on intel core2 cpu topology and 64bit kernel. Every 512KB LLC size increase CACHE_SCOST_THRESHOLD value by 3.
For 32bit kernel, consider 32bit use less data/instruction size, every 256KB LLC size increase CACHE_SCOST_THRESHOLD value by2, but I don't have 32bit systems to prove this yet.
The new cpu topology like smt is not yet be considered in the formula, because there are bounces in the benchmark result when SMT enable, so it's hard to compare the result of different CACHE_SCOST_THRESHOLD value.

3. A kernel boot parameter "cache_scost_threshold" is introduced, it can be used for manually assigning the CACHE_SCOST_THRESHOLD value to scheduler if

cacheinfo is not available in your arch(like arm, most?) or
the simple linear formula is not covering your cpu topology and you want to find out the best value of it.

It's still at its first version, in next version, I'd like to complete the formula to cover the SMT topology.

... to be continued (Part 3)

17 comments:

AnonymousSeptember 25, 2015 at 9:46 AM
Thank you very much for your explanations about what happens under the cover of -vrq code.

The CACHE_SCOST_THRESHOLD value leads me to the following question: How would you describe the expected behaviour of increasing or decreasing this value from your programmer's point of view (plus maybe from your testing experience)? Especially, would it -- and if, how -- affect performance/ throughput/ interactivity?

Thank you in advance and best regards,
Manuel Krause
ReplyDelete
Replies
AnonymousOctober 2, 2015 at 5:01 PM
Hi,
I have i7-5960X and would like to do some benchmarking with the patch.
What exactly are you using to test performance ?

Jan
ReplyDelete
Replies
AnonymousOctober 23, 2015 at 6:23 AM
Hi Alfred,
sooo much time with your patches without any issues -- that's a really nice experience! :-)))

Today I tried compiling a 4.2.4 patched kernel and get the following errors (sorry for the possibly mangled whitespaces):
...
CC kernel/sched/bfs.o
CC arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.o
kernel/sched/bfs.c: In function ‘_cond_resched’:
kernel/sched/bfs.c:5272:6: error: too few arguments to function ‘should_resched’
if (should_resched()) {
^
In file included from include/linux/preempt.h:64:0,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from kernel/sched/bfs.c:31:
./arch/x86/include/asm/preempt.h:93:29: note: declared here
static __always_inline bool should_resched(int preempt_offset)
^
kernel/sched/bfs.c: In function ‘__cond_resched_lock’:
kernel/sched/bfs.c:5290:16: error: too few arguments to function ‘should_resched’
int resched = should_resched();
^
In file included from include/linux/preempt.h:64:0,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from kernel/sched/bfs.c:31:
./arch/x86/include/asm/preempt.h:93:29: note: declared here
static __always_inline bool should_resched(int preempt_offset)
^
kernel/sched/bfs.c: In function ‘__cond_resched_softirq’:
kernel/sched/bfs.c:5312:6: error: too few arguments to function ‘should_resched’
if (should_resched()) {
^
In file included from include/linux/preempt.h:64:0,
from include/linux/spinlock.h:50,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from kernel/sched/bfs.c:31:
./arch/x86/include/asm/preempt.h:93:29: note: declared here
static __always_inline bool should_resched(int preempt_offset)
^
make[2]: *** [kernel/sched/bfs.o] Error 1
make[1]: *** [kernel/sched] Error 2
make: *** [kernel] Error 2
make: *** Waiting for unfinished jobs....

I hope you can answer shortly on how to fix it and/ or provide a patch to solve the issue.

Best regards,
Manuel Krause
ReplyDelete
Replies
AnonymousOctober 25, 2015 at 2:09 PM
BTW, didn't you want to describe "Consider cache in task scheduling" some more? Part 3 is missing for some weeks now.

BR, Manuel Krause
ReplyDelete
Replies

Add comment