Thursday, September 15, 2016

New implementaion of skip list for BFS

Two commits has been pushed to linux-4.7.y-sl branch.

First one is an embedded rewritten version of CK's original skip list in 0480 bfs. It is similar to the code changes what CK has made in 0497, but my implementation are different in
1. Just keep one data structure, that's skiplist_node.
2. Remove kmalloc/kfree completely.
3. Remove value in skiplist_node and use container_of() to obtain the pointer of structure which skiplist_node is embedded in.

The second commit is a new implementation of skip list. In the new implementation:

A customized search function should be defined using DEFINE_SKIPLIST_INSERT macro and be used for skip list insert operation.

Random Level should be customized implemented and set to node->level then pass to the customized skiplist_insert function.

Levels start at zero and go up to (NUM_SKIPLIST_LEVEL -1).
NUM_SKIPLIST_LEVEL in this implementation is 8 instead of origin 16, considering that there will be 256 entries to enable the top level when using random level p=0.5, and that number is more than enough for a run queue in a scheduler usage. And it also help to reduce the memory usage of the embedded skip list node in task_struct to about 50%.

Based on testing, the first 8 bits in microseconds of niffies are suitable for random level population. find_first_bit() is used to satisfy p = 0.5 between each levels, and there should be platform hardware supported instruction(known as ctz/clz) to speed up this function.
The skiplist level for a task is populated when task is created and doesn't change in task's life time. When task is being inserted into run queue, this skiplist level is set to task's sl_node->level, the skiplist insert function may change it based on current level of the skip lsit.

And there is a lot enhancement in insert/delete function.

The hackbench and kernel compilation sanity both show improvement comparing the baseline version.

You can download the all in one patch of baseline version and this new implementation and have a try.

-- Alfred

Edit: Change typo silently.


  1. O.k., good work, "sl_new" is up and running for some hours now. At the moment I want to only post the proof that it's working well on here for now. I haven't faced any issue or interactivity related regression.

    BR Manuel Krause

  2. @Alfred:
    Btw., why do you still call it 472, when you're at 497 code level? Sorry, just kidding a little. You might have your reasons.

    I'm looking forward to your newest revision of good old VRQ on top of skiplist!
    Current sl_new patchset still works fine.
    Please keep up your work!

    BR Manuel Krause

    1. @Manuel
      There is huge code change in 480~497 and it seems that there is still issues with new bfs. So I decided to pick up the skiplist changes from 480/497 first, b/c it has the most value and less risky IMO. And one change at a time helps to isolate the code change when there is problem, and it also help to observe how skiplist impace the scheduler.
      As 4.8 will be released soon, most likely there will be 0472+skiplist+vrq at the earlier of 4.8 cycle.

  3. Hello, and thank you for your work.
    Out of curiosity I've tested your implementation of bfs, and also CK's bfs 497. I've put the results here:

    The throughput is on par with bfs 497 when using the acpi-cpufreq+performance governor, and even slightly better on make -j2. However the cpu frequency signaling part seems broken (as it was on CK's bfs 480) because performance is low when using intel_pstate, but I think you are already aware of that.
    I'll wait for linux 4.8 and for the changes in bfs to settle before doing more tests.


    1. @Pedro
      As I can see in your result that you only test the 4.7 sl_baseline? I guess that's the baseline version in this post?
      If it is correct, this baseline version is the start point I started my work on and without my embedded and new implementation of SL. There are just 5 commits add upon 472, all in my previous blog. You should also try the second patch(, which include all new changes in this post.

      And if so, based on your result in "make -j2", there maybe performance regression under low workload in 0497, and likely caused by other code change beside skiplist changes.
      In my testing, pstate driver shows less performance than acpi driver and even the benchmark of mainline kernel in shows similar result. So I put it in low priority, good news is CK has picked it up and trying to fix it. But there is too many code changes in 480~497, and code doesn't settle down yet. So I am not going to pick up other code chagnes except skiplist at this moment.

    2. That is correct. I tested v4.7_0472_sl_baseline.
      In the meantime I've also run the tests for v4.7_0472_sl_new.
      When using acpi-cpufreq+performance, the results are the same as v4.7_0472_sl_baseline, and indeed it seems there is a regression under low workload.
      However, when using acpi-cpufreq+ondemand, the performance is worse in single-threaded tests and in make -j2.