IMO, cpu cache size, especially the LLC(Last Level Cache) size determined the hardware capacity of how many data can be cached for cpu. And look at this in another way, giving a sequence of tasks switching, the cpu cache size determined how long the data of a task can be kept in cache. For system with large workload, large number of tasks are running at the same time, cpu with larger cache size will help for keeping task's data in cache than cpu with less cache size. For system workload for short response time, like gaming, large cache size doesn't help much. So, AMD is right for this, but large cache size design is good for common workload usage, not just for the workload like gaming.
Task scheduling should take cpu cache size(llc size) into account. In the latest 4.2 vrq branch, there is a new commit implements the first version of code change, which aware of llc cache size of cpu and auto adjust the cache_scost_threshold(cache switch cost threshold) by it. (For the concept of cache switch cost threshold, please reference to Part 1). Here is the brief of what it has been done.
1. export a scheduler interface to cacheinfo(drivers/base/cacheinfo.c) as a call back once cacheinfo is ready.
2. implement a simple linear formula to auto cache_scost_threshold adjustment.
- Now this formula is considered based on intel core2 cpu topology and 64bit kernel. Every 512KB LLC size increase CACHE_SCOST_THRESHOLD value by 3.
- For 32bit kernel, consider 32bit use less data/instruction size, every 256KB LLC size increase CACHE_SCOST_THRESHOLD value by2, but I don't have 32bit systems to prove this yet.
- The new cpu topology like smt is not yet be considered in the formula, because there are bounces in the benchmark result when SMT enable, so it's hard to compare the result of different CACHE_SCOST_THRESHOLD value.
- cacheinfo is not available in your arch(like arm, most?) or
- the simple linear formula is not covering your cpu topology and you want to find out the best value of it.
... to be continued (Part 3)