Alfred Chen's Blog

Monday, September 26, 2016

About Skip List

I read one post about skip list today which link to this web(http://ticki.github.io/blog/skip-lists-done-right/), in which some very useful information can be found.

The "O(1) level generation" is just what I used in my new skiplist implement in BFS, though I never read similar before.

BR Alfred

Saturday, September 17, 2016

Based on the new implementation of skip list, I am adding -VRQ commits upon it. All commits will be pushed in linux-4.7.y-sl branch. Unlike previous -vrq release, I am releasing additional checkpoint tags, so users can use these tags to check their issue and narrow code changes which introducing the issue.

Checkpoint tags:
1. 4.7_0472_sl_baseline and all in one patch.

This is the start point of all the work, all commits are in About skip list in BFS.

2. 4.7_0472_sl_new and all in one patch.

New implementation of skip list, please check the post at New implementation of skip list for BFS.

3. 4.7_0472_sl_new_sync and all in one patch.
Include all Sync-up commits from release to release which not yet be picked up by original BFS.

4. 4.7_0472_sl_new_gc and all in one patch.
This patch include all former -gc commits, most important one is the "
Full cpumask based and LLC sensitive cpu selection" commit, which help with performance under low workload.

5. 4.7_0472_sl_new_vrq and all in one patch.
This is not the same as VRQ3 patch, but only include two major feature "VRQ solution" and "preempt task solution". I want tag a checkpoint here before the latest stick/cache code changes as it is not yet finalized.

6. 4.7_0472_sl_new_vrq_full and all in one patch.
Include all -vrq features upto VRQ3 in 4.7.

...to be continued

BR Alfred

Thursday, September 15, 2016

New implementaion of skip list for BFS

Two commits has been pushed to linux-4.7.y-sl branch.

First one is an embedded rewritten version of CK's original skip list in 0480 bfs. It is similar to the code changes what CK has made in 0497, but my implementation are different in
1. Just keep one data structure, that's skiplist_node.
2. Remove kmalloc/kfree completely.
3. Remove value in skiplist_node and use container_of() to obtain the pointer of structure which skiplist_node is embedded in.

The second commit is a new implementation of skip list. In the new implementation:

A customized search function should be defined using DEFINE_SKIPLIST_INSERT macro and be used for skip list insert operation.

Random Level should be customized implemented and set to node->level then pass to the customized skiplist_insert function.

Levels start at zero and go up to (NUM_SKIPLIST_LEVEL -1).
NUM_SKIPLIST_LEVEL in this implementation is 8 instead of origin 16, considering that there will be 256 entries to enable the top level when using random level p=0.5, and that number is more than enough for a run queue in a scheduler usage. And it also help to reduce the memory usage of the embedded skip list node in task_struct to about 50%.

Based on testing, the first 8 bits in microseconds of niffies are suitable for random level population. find_first_bit() is used to satisfy p = 0.5 between each levels, and there should be platform hardware supported instruction(known as ctz/clz) to speed up this function.
The skiplist level for a task is populated when task is created and doesn't change in task's life time. When task is being inserted into run queue, this skiplist level is set to task's sl_node->level, the skiplist insert function may change it based on current level of the skip lsit.

And there is a lot enhancement in insert/delete function.

The hackbench and kernel compilation sanity both show improvement comparing the baseline version.

You can download the all in one patch of baseline version and this new implementation and have a try.

-- Alfred

Edit: Change typo silently.

Wednesday, September 14, 2016

About skip list in BFS

It's great that CK release huge changes for BFS in 0480 release, the changes not only include cpufreq trigger and cpufreq governor change in recent kernel release but also the most fun part, use skip list to replace the bitmap mask linked queue in global run queue data structure.

May be there are many changes at a time, so BFS 0480 is not settle down yet, most issue are related to cpufreq trigger and cpufreq governor code. I was planed to rework the about 60 -vrq commits upon 0480, but now I'd had to wait till it is calmed down. Meanwhile, I have done some tests for the skiplist design in BFS. That means only three patches in 0480 is used, they are

bfs472-fix_set_task_cpu.patch
skiplists.patch
bfs472-skiplist.patch

For my kernel compilation sanity tests, sl is almost as same as bfs0472, only different is sl is a little bit better than 0472 at 300% workload.

For hackbench(from https://lwn.net/Articles/351058/), 20 groups, 0472 finished at about 4sec, sl finished at 3.6sec. Consider -vrq(0472 based) finished at 3.8sec and cfs at 3.2sec, I can't wait to see how it goes when sl combine with -vrq.

The skiplist design for BFS, IMO, as an initial release, it looks pretty good and very potential. Here are my comments for the skiplist in 0480

#1 In earliest_deadline_task(), RT tasks is now impacted by interactive setting. IMO, it should keep the same behaviors as it is at 0472.

#2 When interactivity is enabled and consider both normal policy tasks and idle policy tasks are in the queue, for example tasks in queue are in this order (N1, N2, D1, D2), in current implement, it's possible that earliest_deadline_task() go through all 4 tasks to pick up one among them. But normal tasks should always has higher priority than idle ones, so just go through the tasks with normal policy(that's the N1, N2 in this example) will be enough.

#3 kmalloc is used for current skip list implementation. It is not a good design for critical code like a scheduler. I have rewriten the skip list implementation to embedded the data structure into the task_struct etc. (PS, CK address this in 0497, meanwhile I have rewritten an embedded implementation too)

#4 Currently, there are same probability that randomLevel() returns 0~15. But "where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4)" -- from https://en.wikipedia.org/wiki/Skip_list , so there should be some improvement to adjust the random level.

I have finished #1 and #2 propose enhancement patch and take it as a baseline for my rest SL related embedded/improvement skip list implementation patches. Once the code has been cleaned up, I will push the rest patches to the linux-4.7.y-sl branch in my git repositories.

BR Alfred

Sunday, August 28, 2016

v4.7_0472_vrq3 patch released

v4.7_0472_vrq3 patch is released

The new code changes have been in -test branch since 4.6 release and tested by users since then, bugs has been fixed and improvement has been added. Although currently -test branch is pending on user feedback for further improvement, but IMO, these code changes now on the -test branch are stable enough to be merged into the -vrq branch. So, here it comes the v4.7_0472_vrq3 patch.

This patch is identical with v4.7_0472_test2 except the version print out in dmesg. Comparing to -vrq2, the major feature in vrq3(-test2) is the introduction of preempt stick task to replace the original stick timeout in previous release, which helps with high workload performance and the result can be proved in the sanity test report.

Code has been pushed to bitbucket and github, all-in-one patch is also available.

Enjoy -vrq3, :)

BR Alfred

Monday, August 22, 2016

v4.7_0472_test2 patch released

v4.7_0472_test2 patch has been released, with the following changes
1. Revert "Immediately select preempt task in deactivate_choose_task().", which is reported introduce lag of mouse pointer moving when idle.
2. Adding quick path in pick_other_cpu_stick_task() when no preemptible rq/cpu or just one preemptible rq/cpu.

Enjoy this new release and feedback will be welcome. :)

BR Alfred

EDIT:
It's nice to hear that -test2 fixed issues on -test1, now, base upon -test2, there are debug patches for testing, you can check them out at previous post "debug patches call for testing"

Friday, August 19, 2016

4.7 debug patches, call for testing

I was working on the debug patches in the previous release and try to work out which direction to be taken, but ending up the one last puzzle is still missing, so uploaded three debug patches upon the -test1 patch, so users can help to complete the whole picture.

The stick and cache mechanism for NORMAL policy tasks are both turned on in -test1 patch, here are three debug patches to find out how it goes when these two mechanism on and off independently.

#1 4.7_vrq_test1_debug_s0c0.patch which turn both stick and cache off, it was the debug1 patch in 4.6 release
#2 4.7_vrq_test1_debug_s0c1.patch which turn stick off and cache on for NORMAL policy tasks, it was the debug2 patch in 4.6 release
#3 4.7_vrq_test1_debug_s1c0.patch which turn stick on and cache off, which is the missing puzzle, :)

It will be appreciate for users to compare the test0, test1 and these three debug patches and focus on NORMAL policy task interactivity then provide the feedback.

Enjoying this puzzle game, :)

BR Alfred

EDIT:
-test2 patch has been released and fix issues reported by user on -test1, so these debug patches should be applied upon -test2 patch for testing and comparing to -test2 patch.