tag:blogger.com,1999:blog-2963790426029213933.post5155618683001751038..comments2024-02-29T00:33:07.382-08:00Comments on Alfred Chen's Blog: A big commit added to 4.1 VRQAlfred Chenhttp://www.blogger.com/profile/03164306846702841944noreply@blogger.comBlogger54125tag:blogger.com,1999:blog-2963790426029213933.post-34473671035941588812015-09-01T12:32:23.367-07:002015-09-01T12:32:23.367-07:00Aaaahhh. O.k. Forget my posting. I've just rea...Aaaahhh. O.k. Forget my posting. I've just read the switch "Load more" at the very bottom of the page.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-76801984310278195622015-09-01T12:28:46.224-07:002015-09-01T12:28:46.224-07:00Mmmh. I've written a comment to the long threa...Mmmh. I've written a comment to the long thread above last night, but can't see it. And the comment count increased by 1 then (and by 2) until now. Can't see the reply. Strange interface.<br /><br />BR ManuelAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-25145641280706534582015-09-01T11:48:22.484-07:002015-09-01T11:48:22.484-07:00@Manuel:
not sure where your post did go: yes tha...@Manuel:<br /><br />not sure where your post did go: yes that change "fixed" it for me,<br /><br />@Alfred:<br /><br />to calm your mind: the lockup I experienced during the ZFS send (twice) it's not scheduler related - well, it appears to be to some point but the focus lies on other system parts (rcu, IRQs, hardware, drivers, etc.) <br /><br />so it's not caused by BFS or your BFS changes :)<br /><br />Thanks !kernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-12395293618667050842015-08-31T13:01:00.163-07:002015-08-31T13:01:00.163-07:00@kernelOfTruth:
Although Alfred already named this...@kernelOfTruth:<br />Although Alfred already named this thead getting off-topic... some new off-topic comment ;-)<br /><br />I'm also using the threadirqs kernel command line option and have not seen direct(!) negative effects. This refers to my postings especially regarding my tests fron August 24th+. These involved a USB 2.0 stick? drive (FAT formatted for compatibility reasons; friends^^).<br /><br />Have you been able to finish the transfer without the "threadirqs" option successfully?<br />(The lkml thread is... somekind of... old? Do you think it's still relevant for the issue? Honest question.)<br /><br />BTW, I'm still searching for "something" (driver, setting, patch e.g.) responsible for TuxOnIce being unreliable sometimes. What I've seen is, that reliability got much better with a) kernel 4.1 up to 4.1.6, b) Alfred's -gc enhancements, equal or better with: c) the -vrq patches' addons. The -vrq patched kernel may fail really rarely, but if it then failed, once in ~one week with ~21 hibernations, the TuxOnIce image is gone.<br /><br />Best regards,<br />Manuel KrauseAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-59296633037276029802015-08-29T12:25:55.421-07:002015-08-29T12:25:55.421-07:00@Alfred:
most likely related to threadirqs (as I ...@Alfred:<br /><br />most likely related to threadirqs (as I expected), I got another hardlock during attempt of transferring ZFS snapshots (around 400 GiB out of 2 TiB - so I have to start over again XD )<br /><br />this time without threadirqs<br /><br />related thread: https://lkml.org/lkml/2013/12/31/144 [3.13 <= rc6. Using USB 2.0 devices is braking the system when using "threadirqs" kernel optio]kernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-33868078136207489192015-08-27T19:28:08.281-07:002015-08-27T19:28:08.281-07:00Just write a new post about the issue. In short, n...Just write a new post about the issue. In short, no new patches for testing, last one seems good.Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-4205869763984756642015-08-27T01:23:07.495-07:002015-08-27T01:23:07.495-07:00@Manuel, take it easy :).
@Alfred, more patches t...@Manuel, take it easy :).<br /><br />@Alfred, more patches to test are coming?Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-44682620920297288562015-08-26T19:35:03.592-07:002015-08-26T19:35:03.592-07:00I'm still investigating the unplugged_io patch...I'm still investigating the unplugged_io patch and try to improve it. For kernel's new ZFS trace, I believe rcu preempt checking mostly likely happens at schedule time, so it's hard to tell it's a scheduler issue.<br /><br />For the next patch for testing, currently I think preempt should be disabled for the additional checking but it may impact performance, so I need a benchmark to see how it goes. I'll start a new post once it is done. This one is growing long and off-topic, :)Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-38755622122034360752015-08-26T12:41:18.885-07:002015-08-26T12:41:18.885-07:00Maybe I also misused the word "bad". I ...Maybe I also misused the word "bad". I just see the other side of the medal, too: Even "bad" news, those regarding failures, are "good" news -- as they would lead to fixes, sooner or later, for our beloved Linux operating system. <br /><br />Best regards,<br />Manuel<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-63417690735257165982015-08-26T12:27:30.269-07:002015-08-26T12:27:30.269-07:00@post-factum:
Sorry, you've definitely got me ...@post-factum:<br />Sorry, you've definitely got me wrong. I meant: As long as we don't get lockup messages from your side, everything seems good for the time you're doing testing until now. Longer, but more precisely.<br /><br />I didn't intend to say that you're only bringing bad news. <br />I really appreciate your work and testing time and would never want to be impolite to you, <br /><br />Best regards,<br />ManuelAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-44027823610892646882015-08-26T11:41:31.865-07:002015-08-26T11:41:31.865-07:00> no bad news from post-factum is good news? Is...> no bad news from post-factum is good news? Isn't it?<br /><br />Oh, jerk off with that :/. As if I bring bad news only.<br /><br />Anyway, second patch still works OK for me.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-88063238227781050922015-08-26T11:40:40.340-07:002015-08-26T11:40:40.340-07:00This comment has been removed by the author.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-39764176896480243782015-08-26T10:52:30.000-07:002015-08-26T10:52:30.000-07:00I think, no bad news from post-factum is good news...I think, no bad news from post-factum is good news? Isn't it?<br /><br />What about the new patch you've mentioned August 25, 2015 at 8:20 AM -- or are you still investigating, whether kernelOfTruth's traces may be scheduler related or not?<br /><br />BR ManuelAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-29302564323482808162015-08-25T19:19:50.485-07:002015-08-25T19:19:50.485-07:00@kernelOfTruth
Most likely not. But I'm sure i...@kernelOfTruth<br />Most likely not. But I'm sure it's not the unplugged_io issue we are tracing.Alfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-28802991726025912662015-08-25T15:38:21.432-07:002015-08-25T15:38:21.432-07:00Just had a hardlock during ZFS snapshot send:
Aug...Just had a hardlock during ZFS snapshot send:<br /><br />Aug 26 00:29:13 morpheus kernel: [69082.418467] INFO: rcu_preempt detected stalls on CPUs/tasks:<br />Aug 26 00:29:13 morpheus kernel: [69082.418477] 4: (0 ticks this GP) idle=9f9/140000000000000/0 softirq=3923228/3923228 fqs=12328 last_accelerate: f53f/85c8, nonlazy_posted: 0, L.<br />Aug 26 00:29:13 morpheus kernel: [69082.418481] 5: (1 GPs behind) idle=8c7/140000000000001/0 softirq=2298621/2298622 fqs=12328 last_accelerate: f53f/85c8, nonlazy_posted: 0, L.<br />Aug 26 00:29:13 morpheus kernel: [69082.418482] (detected by 3, t=37002 jiffies, g=1688364, c=1688363, q=13497)<br />Aug 26 00:29:13 morpheus kernel: [69082.418485] Task dump for CPU 4:<br />Aug 26 00:29:13 morpheus kernel: [69082.418486] irq/23-ehci_hcd R running task 0 353 2 0x00000008<br />Aug 26 00:29:13 morpheus kernel: [69082.418488] ffffffff81e796ae ffffffff81e7b192 0000000000000003 ffff8807f9850000<br />Aug 26 00:29:13 morpheus kernel: [69082.418490] ffff8800cf1a0000 ffff8800cf19fd68 ffff8807f4b2cf00 ffff8807f4e40800<br />Aug 26 00:29:13 morpheus kernel: [69082.418492] ffff8807f4e40800 ffff8800cf1a0000 ffffffff8114d640 ffff8800cf19fd88<br />Aug 26 00:29:13 morpheus kernel: [69082.418494] Call Trace:<br />Aug 26 00:29:13 morpheus kernel: [69082.418508] [] ? __schedule+0x11ae/0x2c60<br />Aug 26 00:29:13 morpheus kernel: [69082.418510] [] ? schedule+0x32/0xc0<br />Aug 26 00:29:13 morpheus kernel: [69082.418513] [] ? irq_thread_fn+0x40/0x40<br />Aug 26 00:29:13 morpheus kernel: [69082.418516] [] ? usb_hcd_irq+0x21/0x40<br />Aug 26 00:29:13 morpheus kernel: [69082.418517] [] ? irq_forced_thread_fn+0x2e/0x70<br />Aug 26 00:29:13 morpheus kernel: [69082.418519] [] ? irq_thread+0x13f/0x170<br />Aug 26 00:29:13 morpheus kernel: [69082.418520] [] ? wake_threads_waitq+0x30/0x30<br />Aug 26 00:29:13 morpheus kernel: [69082.418521] [] ? irq_thread_dtor+0xb0/0xb0<br />Aug 26 00:29:13 morpheus kernel: [69082.418524] [] ? kthread+0xf2/0x110<br />Aug 26 00:29:13 morpheus kernel: [69082.418528] [] ? sched_clock+0x9/0x10<br />Aug 26 00:29:13 morpheus kernel: [69082.418530] [] ? kthread_create_on_node+0x2f0/0x2f0<br />Aug 26 00:29:13 morpheus kernel: [69082.418532] [] ? ret_from_fork+0x42/0x70<br />Aug 26 00:29:13 morpheus kernel: [69082.418533] [] ? kthread_create_on_node+0x2f0/0x2f0<br />Aug 26 00:29:13 morpheus kernel: [69082.418534] Task dump for CPU 5:<br />Aug 26 00:29:13 morpheus kernel: [69082.418535] irq/33-xhci_hcd R running task 0 840 2 0x00000008<br />Aug 26 00:29:13 morpheus kernel: [69082.418537] 0000000000000003 ffff88066ef1eb80 ffff8800be358000 00000000f9852300<br />Aug 26 00:29:13 morpheus kernel: [69082.418539] 00000000296b0ad0 ffff8807f5593d68 ffff8807f550d100 ffff8807f51c5a00<br />Aug 26 00:29:13 morpheus kernel: [69082.418541] ffff8807f51c5a00 ffff8807f50d4600 ffffffff8114d640 ffff8807f5593d88<br />Aug 26 00:29:13 morpheus kernel: [69082.418543] Call Trace:<br />Aug 26 00:29:13 morpheus kernel: [69082.418544] [] ? irq_thread_fn+0x40/0x40<br />Aug 26 00:29:13 morpheus kernel: [69082.418557] [] ? xhci_msi_irq+0xc/0x10 [xhci_hcd]<br />Aug 26 00:29:13 morpheus kernel: [69082.418558] [] ? irq_forced_thread_fn+0x2e/0x70<br />Aug 26 00:29:13 morpheus kernel: [69082.418559] [] ? irq_thread+0x13f/0x170<br />Aug 26 00:29:13 morpheus kernel: [69082.418561] [] ? wake_threads_waitq+0x30/0x30<br />Aug 26 00:29:13 morpheus kernel: [69082.418562] [] ? irq_thread_dtor+0xb0/0xb0<br />Aug 26 00:29:13 morpheus kernel: [69082.418563] [] ? kthread+0xf2/0x110<br />Aug 26 00:29:13 morpheus kernel: [69082.418565] [] ? sched_clock+0x9/0x10<br />Aug 26 00:29:13 morpheus kernel: [69082.418567] [] ? kthread_create_on_node+0x2f0/0x2f0<br />Aug 26 00:29:13 morpheus kernel: [69082.418568] [] ? ret_from_fork+0x42/0x70<br />Aug 26 00:29:13 morpheus kernel: [69082.418570] [] ? kthread_create_on_node+0x2f0/0x2f0<br />Aug 26 00:32:17 morpheus kernel: [ 0.000000] Initializing cgroup subsys cpuset<br /><br /><br />looks like it's most likely not related to the scheduler, no ?kernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-85588808242944426082015-08-25T08:20:50.986-07:002015-08-25T08:20:50.986-07:00Thanks all of you for testing. While waiting for p...Thanks all of you for testing. While waiting for pf's final confirm, I'd like to prepare another patch for testing.<br /><br />BR AlfredAlfred Chenhttps://www.blogger.com/profile/03164306846702841944noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-27439322377621921812015-08-25T06:58:02.370-07:002015-08-25T06:58:02.370-07:00Also:
===
pf@defiant:~ » uptime ...Also:<br /><br />===<br />pf@defiant:~ » uptime <br /> 16:57:31 up 5:43, 1 user, load average: 3.51, 1.92, 1.17<br />pf@defiant:~ » sudo btrfs scrub status /<br />scrub status for 14140a7f-23bc-4dab-b263-f2f46f5d70aa<br /> scrub started at Tue Aug 25 16:55:10 2015 and finished after 00:02:15<br /> total bytes scrubbed: 76.83GiB with 0 errors<br />===<br /><br />Still works OK, but uptime is too small, need more time.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-90611033014711816072015-08-25T05:59:13.658-07:002015-08-25T05:59:13.658-07:00Stupid blogger interface ?
Where did my post go ?...Stupid blogger interface ?<br /><br />Where did my post go ?<br /><br />@Alfred:<br /><br />Great news !<br /><br />it survived the first 2 minutes and finished without hardlocks (5-6 hours)<br /><br />Once there's enough changes to the system I'll attempt another stage4 backup and see whether that hardlocks the system - but I doubt it will :)<br /><br />Awesome work !kernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-8374197762855249922015-08-25T01:05:17.694-07:002015-08-25T01:05:17.694-07:00Compiling and testing sched_submit_work_02.patch, ...Compiling and testing sched_submit_work_02.patch, stay tuned.Oleksandr Natalenkohttps://www.blogger.com/profile/12098091624630953604noreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-30211507716364413842015-08-24T15:48:50.337-07:002015-08-24T15:48:50.337-07:00I don't see/ feel negative subjective experien...I don't see/ feel negative subjective experiences with -vrq and the new patch. Uptime ~9h.<br /><br />BR ManuelAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-77207143223140019742015-08-24T15:34:57.379-07:002015-08-24T15:34:57.379-07:00Will test perhaps at the weekend or earlier,
the ...Will test perhaps at the weekend or earlier,<br /><br />the lockups would mostly occur with Btrfs,<br /><br />I haven't used ext4 for a long time so I'm not sure if there are still quirks with it<br /><br /><br />Crosses fingers that this fixes it =)kernelOfTruthnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-72550154167614100712015-08-24T15:25:55.776-07:002015-08-24T15:25:55.776-07:00@kernelOfTruth & @post-factum:
Now it seems t...@kernelOfTruth & @post-factum:<br /><br />Now it seems to be at you, to prove that the new https://bitbucket.org/alfredchen/linux-gc/downloads/sched_submit_work_02.patch<br /><br />works for you even on btrfs scrub.<br />I'm running it on the -vrq branch, btw.<br /><br />Thank you all for your participation,<br /><br />ManuelAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-59675472349872291012015-08-24T15:19:50.605-07:002015-08-24T15:19:50.605-07:00Noone could ever count on crossposting. But especi...Noone could ever count on crossposting. But especially on here? ;-)<br /><br />You've seen, that I've done some compressing with ext4 partitions' content without issues. It was only about 1.2 GiB. <br /><br />Thank you for your added info.<br /><br />ManuelAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-33051483161200049052015-08-24T15:10:10.257-07:002015-08-24T15:10:10.257-07:00Need to add: all involved partitions are ext4. ^^ ...Need to add: all involved partitions are ext4. ^^ *MKAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2963790426029213933.post-20336826303267149822015-08-24T15:07:18.704-07:002015-08-24T15:07:18.704-07:00Yes, it's another possible trigger scenario,
...Yes, it's another possible trigger scenario,<br /><br />not concurrently, yes separately, there also was running certain rsync jobs but that doesn't seem to apply here<br /><br /><br />sure:<br /><br />/mnt/*<br />/boot/*<br />/tmp/*<br />/proc/*<br />/home/*<br />/sys/*<br />/usb/*<br />/var/cache/edb/dep/*<br />/var/cache/squid/*<br />/var/tmp/*<br />/media/*<br />/usr/portage/*<br />/usr/gentoo/*<br /><br /><br />There were issues with the restored system when including /dev/* in that least, so I deliberately left it out<br /><br />Also I've a separate backup command for /boot, but that doesn't matter for this purpose - it's simply for causing a high i/o, cpu and scheduler load<br /><br />yes, mmt equals the cores, afaik it should do it automatically (?) but I remember having had issues in the past without it (less throughput)<br /><br /><br />It's rooted in Gentoo's stages and backup procedures<br /><br />http://badpenguins.com/gentoo-build-test/<br /><br />http://www.gentoo-wiki.info/HOWTO_Custom_Stage4<br /><br />https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Media#What_are_stages_then.3F<br /><br />stage4 in that case would be fully installed and configured system :)<br /><br />stage3 is where you usually start when following the gentoo handbook<br />kernelOfTruthnoreply@blogger.com