Monday, August 10, 2015

A big commit added to 4.1 VRQ

As title, this big commit is 117d783 bfs: VRQ solution v0.5

I think the most unstable issues in previous vrq release is caused by this and I believe most known issues(on my machines) have been fixed. It has been run stably for two weeks. So you are encouraged to have a try.

Know issue:
BUG: using smp_processor_id() in preemptible code, call trace from sys_sched_yield().

There still a few commits left I haven't reworked yet. I plan to finish them in two weeks before new kernel release and another sync-up cycle begins.

BR Alfred


  1. No problems during ~3h of uptime. :-)

    BR, Manuel

  2. Just to "send you some more energy" for your further reworking sessions, here some more POSITIVE feedback:

    Kernel 4.1.5 with all these -gc & -vrq patches is now running fine for more than 24h. No issues at all, and nothing related to your mentioned "known issue: BUG: ..." like you've written above. I followed your opinion (and post-factum's as of his 4.1-pf2 announcement) to _not_ use the revert-unplugged-i/o patch and see how far I'd come without it.

    This afternoon I've spent some time to try to reproduce crashing with data transfer to/from external drives. In the lack of a flash/SD card, I used my USB stick to copy larger files to and from, to extract compressed files from and to, but got no problems at all. Not even stuttering video playback during all these processes. BTW, that's a really great point to mention! :-))) Now: maybe the data rates are too low with my dual core notebook hardware and a USB 2.0 stick? (Approximate overall copying speed to it == 14MiB/s and from it == 28MiB/s, extracting takes much longer, of course.) Oh, and both file systems were not Btrfs (but ext4 and vfat).

    After all of that good experience, I'm really convinced of the usefulness and effectiveness of the so-far-published -vrq patches to reduce desktop latency, improve interactivity and kind of "equalize" bottleneck situations, especially regarding high i/o and high cpu and also both together. That also includes my use of /dev/shm and resulting swapping to disk. I'm using the BFQ disk scheduler and TuxOnIce hibernation together with -vrq.
    At the moment I don't feel the need to compare with plain&pure BFS/CK at all. And a recent test of the 4.2-rc5 kernel with the CFS scheduler (and without BFQ, too) at least showed, how far away the plain kernel is from desirable AND achievable targets regarding low latency.

    Dear Alfred, please keep up your wonderful work!

    BR Manuel Krause

  3. Hi Alfred,

    great news, thanks for your great work !

    I've quickly looked for [via Google] a description of VRQ but haven't found a short one:

    is it an optimized locking mechanism which improves scalability and performance ?

    Are there any changes related to SMT_NICE and VRQ ?

    Will see if I find some time to update to BFS & VRQ within the next days (or even weeks) and give it a good testing

    Thanks !

    1. @kernelOfTruth
      Catch you! How is your "btrfs scrub" issue going? CK released an reverse_unplug patch which similar to your previous "trail patch". I want to know how your issue going and if ck's patch resolve it?

    2. Hi Alfred,

      I wiped Btrfs from that particular Backup drive - so now only ZFS is remaining as Filesystem for valuable data (Btrfs is still on / [root], /usr/portage and /var/tmp [in RAM]),

      scrubs for ZFS and rsync haven't shown any issues so far

      So yes that patch resolved it (using BFS 463) :)


    3. @kernelOfTruth: Have you already tried to remove Con's patch and use Alfred's modification from
      I've had it running on top of the -vrq solution as of before yesterday's updates for three days without related problems.

      BR Manuel Krause

    4. Hi Manuel,

      I haven't but I'm running latest VRQ 0.5 now,

      so that patch is needed to address the heavy i/o lockups ?

      Oh fun, another round of compiling :P


    5. Hi, kernelOfTruth,
      I don't know if -vrq would fix the issue by itself. In the other threads (including CK's blog), post-factum has written, that with Con's fix his machine worked well (and NOT without it) and he's testing Alfred's new one at the moment (...until something happens). He's using the plain -gc branch, I assume.
      With the complete new -vrq as of today +Alfred's patch I have had no issues at all in ~4h of run time.
      Maybe you can use & test the above mentioned patch, if ever you face errors you've had in the past when you're running it without Con's patch atm.

      I just only wanted to invite you for another testing challenge. Never mind. ;-)

    6. Hi Manuel,

      I'm generally interested in testing new and exciting things =)

      but I'm currently occupied with tracking down an issue with ZFS and I've also got some deep scientific research in top priority - so not much additional time ;)

      Yes, I had issues on Con's 463 BFS without his patch - so I assume that I also need Alfred's approach with VRQ

    7. @kernelOfTruth
      If you still have similar "unpluged io" issue, try CK's patch first. If it fixed the problem, plz try my replacement patch at and see whether it works too.

      BR Alfred

    8. @Alfred:
      Seems like you're not completely convinced of your approach, when you lead kernelOfTruth to test Con's patch first... ? If he's too busy -- shouldn't he verify the usefulness of your patch first, especially under the aspect, that post-factum already confirmed that -gc worked well on his machine with Con's patch?

      BR Manuel

    9. Well, I want both patch tested. But giving a limited time for testing and priority these two patch, I'll suggest CK's first, as it's tested and it's more likely fix the issue, although there may be another better way, if have more time to find it out.

    10. O.k., looking forward to the other "better" way, as always... :-))
      BTW, the new complete -vrq keeps running fine now for ~ 23h.

      BR Manuel

    11. Last message is now superseeded by post-factum's posting in the other thread on here: Alfred's patch would not heal the issue. :-(

    12. That's weird - I did a full rsync of 2 TiB to a (newly created) Btrfs backup drive

      and with applied on top of VRQ 0.5

      no hardlocks or error messages observed

      meanwhile I mostly modified the rsync settings to

      rsync -ai --delete --stats
      rsync -aiz --delete --stats

      previously it was

      rsync -ai -W --inplace --delete --stats


      rsync -aiz -W --inplace --delete --stats

      so that could have been the settings that lead to lots of stress in the past

    13. And... If you used the previous rsync commands with the new VRQ kernel, would you see hardlocks/ errors again?
      Don't know if that test makes sense for you atm.

      BR, Manuel

    14. I did try both (the old ones seemed to slow down rsync to some degree so I modified them to the new ones) - and still no hardlock

      but I'll see if I find the time to let it run as one great job on the whole partition (harddrive) with that (old) command if that provokes something.

      Usually I let it run split up in several smaller jobs for bigger (sub-)folders to have a better overview (and perhaps speed-up).

    15. The thing is:

      I don't have that data (or: partition) with Btrfs that seemed to provoke that hardlock,

      for some time I ran ZFS on that harddrive but realized that it's better that have at least two different filesystems in case something gets messed up with one

      so I formatted that backup drive and re-formatted it with Btrfs.

      The difference now is that the Btrfs code changed and the data is more up-to-date:

      Triggers could have been the Btrfs or the data on the Btrfs partition.

      If the latter was the case it might no longer trigger ...

    16. Ah crap,

      so it wasn't rsync (that had been several months or even years back) as the trigger,


      btrfs scrub


      will see if I can let a btrfs scrub run over night (I need the box right now for work)

    17. Okay,

      so I got the NULL pointer dereference error during scrub of the said partition (close to 2 TiB)

      it occured pretty quickly after 1-2 minutes


      was also set

    18. @kernelOfTruth:
      Thank you very much for your additional time for testing -- and for providing traces! Let's hope Alfred had a nice weekend and will find time to elaborate a fix (together with the trace from post-factum).

      Best regards,

    19. Hope it helps, besides this VRQ runs quite well (no lockup with ZFS scrub if I recall correctly)

      In the other thread the mount options were requested:


      the volume is on top of cryptsetup_luks

      btrfs-progs v4.1.2

      Btrfs code is pretty bleeding edge but it shouldn't make a difference since it occurred with earlier Btrfs code

    20. @kernelOfTruth
      Thanks for test and provide the trace. I guess the sched_submit_work() doesn't work for bfs b/c bfs use grq_lock instead task_lock() in mainline which a combine of task's pi_lock and rq->lock, the checking of tsk_is_pi_blocked(tsk) is not enough for BFS. Here comes an enhancement patch to add more checking. Plz apply it upon gc-branch code and see if it works.

    21. I assume, we shouldn't use the first sched_submit_work.patch any more? Right?

    22. Yes. I have removed the sched_submit_work.patch from the download page.

  4. Another possible trigger is to create a stage4 tarball (backing up the system)


    time (tar -cp / -X /root/scripts/stage4.excl | 7z a -si -t7z -m0=lzma2 -mx=7 -mfb=64 -md=64m -ms=on -mmt=8 /home/system/stage4.t7z)

    haven't tested if both that and the scrub lead to a positive (hard)lock

    1. You name this as another possible trigger scenario? But not, running it at the same time with btrfs scrub. Right?

    2. Some other questions before I try this:
      * Your -X exclusion list... Is it a public one or your private? I don't know what is needed to exclude.
      * The -mmt value is set according to your No. of cores?
      * Why is it called "stage4"?

      Thanks for clarification and best regards,
      Anonymous Manuel Krause ;-)

    3. Mmmh. was just too curious that I've tried on my own with a directory with 1.2 GiB first.
      time (tar -cp /directory/with/1_2GiB/ | 7z a -si -t7z -m0=lzma2 -mx=7 -mfb=64 -md=64m -ms=on -mmt=2 /place/for/1_2GiB.t7z)

      No issues on here. Also video playback from a german "Mediathek" in flashplayer within firefox had only low stuttering during that process.

      Best regards,

    4. Yes, it's another possible trigger scenario,

      not concurrently, yes separately, there also was running certain rsync jobs but that doesn't seem to apply here



      There were issues with the restored system when including /dev/* in that least, so I deliberately left it out

      Also I've a separate backup command for /boot, but that doesn't matter for this purpose - it's simply for causing a high i/o, cpu and scheduler load

      yes, mmt equals the cores, afaik it should do it automatically (?) but I remember having had issues in the past without it (less throughput)

      It's rooted in Gentoo's stages and backup procedures

      stage4 in that case would be fully installed and configured system :)

      stage3 is where you usually start when following the gentoo handbook

    5. Need to add: all involved partitions are ext4. ^^ *MK

    6. Noone could ever count on crossposting. But especially on here? ;-)

      You've seen, that I've done some compressing with ext4 partitions' content without issues. It was only about 1.2 GiB.

      Thank you for your added info.


    7. @kernelOfTruth & @post-factum:

      Now it seems to be at you, to prove that the new

      works for you even on btrfs scrub.
      I'm running it on the -vrq branch, btw.

      Thank you all for your participation,


    8. Will test perhaps at the weekend or earlier,

      the lockups would mostly occur with Btrfs,

      I haven't used ext4 for a long time so I'm not sure if there are still quirks with it

      Crosses fingers that this fixes it =)

    9. I don't see/ feel negative subjective experiences with -vrq and the new patch. Uptime ~9h.

      BR Manuel

    10. Compiling and testing sched_submit_work_02.patch, stay tuned.

    11. Stupid blogger interface ?

      Where did my post go ?


      Great news !

      it survived the first 2 minutes and finished without hardlocks (5-6 hours)

      Once there's enough changes to the system I'll attempt another stage4 backup and see whether that hardlocks the system - but I doubt it will :)

      Awesome work !

    12. Also:

      pf@defiant:~ » uptime
      16:57:31 up 5:43, 1 user, load average: 3.51, 1.92, 1.17
      pf@defiant:~ » sudo btrfs scrub status /
      scrub status for 14140a7f-23bc-4dab-b263-f2f46f5d70aa
      scrub started at Tue Aug 25 16:55:10 2015 and finished after 00:02:15
      total bytes scrubbed: 76.83GiB with 0 errors

      Still works OK, but uptime is too small, need more time.

    13. Thanks all of you for testing. While waiting for pf's final confirm, I'd like to prepare another patch for testing.

      BR Alfred

    14. Just had a hardlock during ZFS snapshot send:

      Aug 26 00:29:13 morpheus kernel: [69082.418467] INFO: rcu_preempt detected stalls on CPUs/tasks:
      Aug 26 00:29:13 morpheus kernel: [69082.418477] 4: (0 ticks this GP) idle=9f9/140000000000000/0 softirq=3923228/3923228 fqs=12328 last_accelerate: f53f/85c8, nonlazy_posted: 0, L.
      Aug 26 00:29:13 morpheus kernel: [69082.418481] 5: (1 GPs behind) idle=8c7/140000000000001/0 softirq=2298621/2298622 fqs=12328 last_accelerate: f53f/85c8, nonlazy_posted: 0, L.
      Aug 26 00:29:13 morpheus kernel: [69082.418482] (detected by 3, t=37002 jiffies, g=1688364, c=1688363, q=13497)
      Aug 26 00:29:13 morpheus kernel: [69082.418485] Task dump for CPU 4:
      Aug 26 00:29:13 morpheus kernel: [69082.418486] irq/23-ehci_hcd R running task 0 353 2 0x00000008
      Aug 26 00:29:13 morpheus kernel: [69082.418488] ffffffff81e796ae ffffffff81e7b192 0000000000000003 ffff8807f9850000
      Aug 26 00:29:13 morpheus kernel: [69082.418490] ffff8800cf1a0000 ffff8800cf19fd68 ffff8807f4b2cf00 ffff8807f4e40800
      Aug 26 00:29:13 morpheus kernel: [69082.418492] ffff8807f4e40800 ffff8800cf1a0000 ffffffff8114d640 ffff8800cf19fd88
      Aug 26 00:29:13 morpheus kernel: [69082.418494] Call Trace:
      Aug 26 00:29:13 morpheus kernel: [69082.418508] [] ? __schedule+0x11ae/0x2c60
      Aug 26 00:29:13 morpheus kernel: [69082.418510] [] ? schedule+0x32/0xc0
      Aug 26 00:29:13 morpheus kernel: [69082.418513] [] ? irq_thread_fn+0x40/0x40
      Aug 26 00:29:13 morpheus kernel: [69082.418516] [] ? usb_hcd_irq+0x21/0x40
      Aug 26 00:29:13 morpheus kernel: [69082.418517] [] ? irq_forced_thread_fn+0x2e/0x70
      Aug 26 00:29:13 morpheus kernel: [69082.418519] [] ? irq_thread+0x13f/0x170
      Aug 26 00:29:13 morpheus kernel: [69082.418520] [] ? wake_threads_waitq+0x30/0x30
      Aug 26 00:29:13 morpheus kernel: [69082.418521] [] ? irq_thread_dtor+0xb0/0xb0
      Aug 26 00:29:13 morpheus kernel: [69082.418524] [] ? kthread+0xf2/0x110
      Aug 26 00:29:13 morpheus kernel: [69082.418528] [] ? sched_clock+0x9/0x10
      Aug 26 00:29:13 morpheus kernel: [69082.418530] [] ? kthread_create_on_node+0x2f0/0x2f0
      Aug 26 00:29:13 morpheus kernel: [69082.418532] [] ? ret_from_fork+0x42/0x70
      Aug 26 00:29:13 morpheus kernel: [69082.418533] [] ? kthread_create_on_node+0x2f0/0x2f0
      Aug 26 00:29:13 morpheus kernel: [69082.418534] Task dump for CPU 5:
      Aug 26 00:29:13 morpheus kernel: [69082.418535] irq/33-xhci_hcd R running task 0 840 2 0x00000008
      Aug 26 00:29:13 morpheus kernel: [69082.418537] 0000000000000003 ffff88066ef1eb80 ffff8800be358000 00000000f9852300
      Aug 26 00:29:13 morpheus kernel: [69082.418539] 00000000296b0ad0 ffff8807f5593d68 ffff8807f550d100 ffff8807f51c5a00
      Aug 26 00:29:13 morpheus kernel: [69082.418541] ffff8807f51c5a00 ffff8807f50d4600 ffffffff8114d640 ffff8807f5593d88
      Aug 26 00:29:13 morpheus kernel: [69082.418543] Call Trace:
      Aug 26 00:29:13 morpheus kernel: [69082.418544] [] ? irq_thread_fn+0x40/0x40
      Aug 26 00:29:13 morpheus kernel: [69082.418557] [] ? xhci_msi_irq+0xc/0x10 [xhci_hcd]
      Aug 26 00:29:13 morpheus kernel: [69082.418558] [] ? irq_forced_thread_fn+0x2e/0x70
      Aug 26 00:29:13 morpheus kernel: [69082.418559] [] ? irq_thread+0x13f/0x170
      Aug 26 00:29:13 morpheus kernel: [69082.418561] [] ? wake_threads_waitq+0x30/0x30
      Aug 26 00:29:13 morpheus kernel: [69082.418562] [] ? irq_thread_dtor+0xb0/0xb0
      Aug 26 00:29:13 morpheus kernel: [69082.418563] [] ? kthread+0xf2/0x110
      Aug 26 00:29:13 morpheus kernel: [69082.418565] [] ? sched_clock+0x9/0x10
      Aug 26 00:29:13 morpheus kernel: [69082.418567] [] ? kthread_create_on_node+0x2f0/0x2f0
      Aug 26 00:29:13 morpheus kernel: [69082.418568] [] ? ret_from_fork+0x42/0x70
      Aug 26 00:29:13 morpheus kernel: [69082.418570] [] ? kthread_create_on_node+0x2f0/0x2f0
      Aug 26 00:32:17 morpheus kernel: [ 0.000000] Initializing cgroup subsys cpuset

      looks like it's most likely not related to the scheduler, no ?

    15. @kernelOfTruth
      Most likely not. But I'm sure it's not the unplugged_io issue we are tracing.

    16. I think, no bad news from post-factum is good news? Isn't it?

      What about the new patch you've mentioned August 25, 2015 at 8:20 AM -- or are you still investigating, whether kernelOfTruth's traces may be scheduler related or not?

      BR Manuel

    17. This comment has been removed by the author.

    18. > no bad news from post-factum is good news? Isn't it?

      Oh, jerk off with that :/. As if I bring bad news only.

      Anyway, second patch still works OK for me.

    19. @post-factum:
      Sorry, you've definitely got me wrong. I meant: As long as we don't get lockup messages from your side, everything seems good for the time you're doing testing until now. Longer, but more precisely.

      I didn't intend to say that you're only bringing bad news.
      I really appreciate your work and testing time and would never want to be impolite to you,

      Best regards,

    20. Maybe I also misused the word "bad". I just see the other side of the medal, too: Even "bad" news, those regarding failures, are "good" news -- as they would lead to fixes, sooner or later, for our beloved Linux operating system.

      Best regards,

    21. I'm still investigating the unplugged_io patch and try to improve it. For kernel's new ZFS trace, I believe rcu preempt checking mostly likely happens at schedule time, so it's hard to tell it's a scheduler issue.

      For the next patch for testing, currently I think preempt should be disabled for the additional checking but it may impact performance, so I need a benchmark to see how it goes. I'll start a new post once it is done. This one is growing long and off-topic, :)

    22. @Manuel, take it easy :).

      @Alfred, more patches to test are coming?

    23. Just write a new post about the issue. In short, no new patches for testing, last one seems good.

    24. @Alfred:

      most likely related to threadirqs (as I expected), I got another hardlock during attempt of transferring ZFS snapshots (around 400 GiB out of 2 TiB - so I have to start over again XD )

      this time without threadirqs

      related thread: [3.13 <= rc6. Using USB 2.0 devices is braking the system when using "threadirqs" kernel optio]

    25. @kernelOfTruth:
      Although Alfred already named this thead getting off-topic... some new off-topic comment ;-)

      I'm also using the threadirqs kernel command line option and have not seen direct(!) negative effects. This refers to my postings especially regarding my tests fron August 24th+. These involved a USB 2.0 stick? drive (FAT formatted for compatibility reasons; friends^^).

      Have you been able to finish the transfer without the "threadirqs" option successfully?
      (The lkml thread is... somekind of... old? Do you think it's still relevant for the issue? Honest question.)

      BTW, I'm still searching for "something" (driver, setting, patch e.g.) responsible for TuxOnIce being unreliable sometimes. What I've seen is, that reliability got much better with a) kernel 4.1 up to 4.1.6, b) Alfred's -gc enhancements, equal or better with: c) the -vrq patches' addons. The -vrq patched kernel may fail really rarely, but if it then failed, once in ~one week with ~21 hibernations, the TuxOnIce image is gone.

      Best regards,
      Manuel Krause

    26. @Manuel:

      not sure where your post did go: yes that change "fixed" it for me,


      to calm your mind: the lockup I experienced during the ZFS send (twice) it's not scheduler related - well, it appears to be to some point but the focus lies on other system parts (rcu, IRQs, hardware, drivers, etc.)

      so it's not caused by BFS or your BFS changes :)

      Thanks !

  5. Mmmh. I've written a comment to the long thread above last night, but can't see it. And the comment count increased by 1 then (and by 2) until now. Can't see the reply. Strange interface.

    BR Manuel

    1. Aaaahhh. O.k. Forget my posting. I've just read the switch "Load more" at the very bottom of the page.