-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input/Output error #2
Comments
https://bbs.archlinux.org/viewtopic.php?id=205147&p=40 post 999 might be related. It seems to refer to the vanilla 4.8 kernel. I have had a quick look at backporting the nvme patches to 4.7.x. Its not trivial. |
I've tried with the vanilla 4.8 kernel and get no such errors. I get the errors with the v4 nvme patches from Andy Lutomirski with what I believe is a Samsung 950 Pro (Device: pci 0xa802). When I ordered this laptop, this drive was exclusive to the workstation model and not available on the XPS 15 9550, I'm not sure if it's an incompatibility between the specific hardware and the patches. Either way, I suspect I should probably report this upstream to the LKML as well. |
Just on the off-chance that you ran into the bug in 4.8:(http://lkml.iu.edu/hypermail/linux/kernel/1610.0/00878.html) i have updated it to 4.8.1. If this does not fix it (chances would be slim as you tried vanilla), reporting upstream is the way to go. Please keep us updated if you do. |
I've reported this bug upstream as it is consistent for me across all versions of 4.8.x that have been released so far. I have a Dell engineer in the loop as well in case there is something specific to the platform that is contributing to this issue. Thanks for your quick feedback on it and I'll update here if anything gets resolved (or at least clarified.) There was a similar bug report made here: NVMe device suddenly unavailable that kind of fizzled away without resolution but it doesn't appear to be related specifically to the NVMe APST patches so it could be that I'm actually seeing some other issue that is coincidental to applying the patch. |
Great to hear! i will leave this issue open so we can receive your feedback in it. Thanks! |
Quick update on here, we've debugged this problem to where the upstream kernel developers have identified a possible solution. Andy is working on a new version of the patch. I can copy you on the mail thread instead of sending updates here if it's a more convenient way to track the issue. |
There's an updated experimental patch set for this that attempts to fix the bug that I reported upstream. It's available at 20160512-nvme-test. By default, applying this patch set (which includes eight patches if you go up the tree from 1a075417a8c9 nvme/scsi: Remove START STOP emulation to aab102ebbaae dev_pm_qos: |
Thanks for the updates on this. Would this also affect the PM951? The "quirk table" patch doesn't specifically include it, but I wonder if there any lasting consequences before I go about trying it 😄 |
I have not heard of any lasting consequences. The patches included in my 4.10 only differ slightly from what you will find in mainline 4.11rcX. |
Thanks. I'm running the APST-full patches against 4.10.3 and am seeing ~1W reduction at idle 👍 |
To the best of my knowledge, this is an issue that's specific to the model listed in the quirk table and possibly even the Dell platform that I encountered the bug on. The bug has finally been confirmed and replicated by Dell and is being formally investigated by Samsung in Korea (as of this morning) on the previous generation XPS 15 9550/Precision 5510 platform. I'm in the email chain and I'll keep this issue updated with any relevant information. |
Great thanks ! i will reopen this so that people know this is ongoing. |
With this patch now staged for inclusion in 4.11, more people are experiencing this bug. Two upstream reports can be tracked here: Bug 194921 - Kernel oopses/panics after controller gets reset I guess this issue project will be moot once the patches are upstreamed, but for the time being, I thought I'd report these here for those affected to be able to keep tabs on the relevant bugs. |
I think I'm running into this issue with a PM961. Hard to say for sure because it has only happened 3 times so far (the first time about 3 weeks ago). I'm definitely getting input/output errors each time, but I haven't managed to capture any error logs yet, and it occurs so infrequently it would be hard to test whether a change to APST settings makes any difference. I do see that someone else with a PM961 seems to be having the problem. See the end of the "NVMe sudden controller death" bug. Please let me know if there's any other info I can provide, or if there's a more appropriate place to post.
|
I'm not sure if this is specific to my particular hardware, but applying these power patches to the kernel on my Precision 5510 (workstation analogue of the XPS 15 9550) seems to result in instability that ultimately leads to the OS complaining of input/output errors as if there was something wrong with the underlying storage hardware. I have not noticed this issue when using Linux 4.7.3 with its older NVMe codebase. Is this something that has been noticed on any other hardware?
The text was updated successfully, but these errors were encountered: