NVIDIA and KDE.

Jan 25, 2018 19:47


I have a home workstation box running Gentoo Linux. The desktop environment is KDE Plasma, currently at 5.11.5. And that's apparently not a good combination with the current NVIDIA driver. In the past I have seen occasional broken OpenGL compositor behavior, when a window suddenly flickered to black until redrawn, however otherwise everything worked properly, and I did not bother investigating what was a known but rare annoyance. Recently, however, I have found that the problem got much worse, screen occasionally turned to solid black, and X11 the driver produced an infamous "XID 31" log message.

The problem is not fatal, switching to a text console, restarting kwin_x11 and switching back to the desktop restores sanity while the rest of the session remains running, however it became frequent enough to become very annoying. I was also afraid that whatever was confused enough to access the wrong video memory address (what "XID 31" is according to NVIDIA), may also do something worse -- leak objects, corrupt GPU data or crash the driver.

Quick search had shown that no one is sure what exactly is happening, however the problem seems to be mostly confined to specifically recent NVIDIA drivers and also recent KDE, specifically KWin window manager. It disappears when KWin compositing functionality is disabled, however that also turns smooth screen updates into the dreaded "window tearing" pattern. Basically, video scanning and screen updates act as a rolling shutter, producing clearly visible horizontal line separating a part of the screen with a new frame from the part where the previous one is still displayed. The easiest way to reproduce this is to move a window and look at its left and right margins, they often show a "stair step" at a random position. It's possible to enable minimal compositing with XRender, however that cripples performance and does not work with everything.

To make things worse, it was not clear what exactly is wrong. KDE developers blamed NVIDIA, and they have a point considering that NVIDIA drivers are proprietary and therefore it's impossible to debug their internals. NVIDIA people did not seem to be eager to investigate this -- they seem to be still in the process of figuring out how to reproduce the misbehavior, and it is clear that no reliable method to reproduce this exists. Apparently there is also a long-running conflict between maintainers of KWin and NVIDIA developers, and it is related to Wayland, something I would rather avoid touching, especially when those two groups are involved.

So my first reaction was to use one nice feature that KDE has -- its ability to use alternative window managers without breaking horribly. So I have taken a well-trodden path, and changed the window manager from KWin to Openbox. Current KDE configuration tools do not allow switching window managers through its configuration GUI, however just having a session with KDEWM="/usr/bin/openbox" is sufficient. Gentoo has a package "x11-wm/openbox" that contains /usr/bin/openbox-kde-session file doing just that, so it did not require me to configure anything special. Openbox does not provide compositing by itself, however it works with external compositors. I have installed Compton and configured it to be started with KDE session.

It worked. Whatever problem triggered a bug in NVIDIA code from KWin compositor, or whatever bug existed in KWin compositor that manifested with NVIDIA driver and hardware, did not show up after days of use. Unfortunately one more thing never shown up, either -- switching to a virtual desktop containing a window when I select a window in the task manager bar on my KDE panel. I had to specifically select the right desktop in the desktop pager to see the window if it is not on the current desktop already. That was inconvenient, I am accustomed to all windows from all desktops listed in the task manager bar, and all of them being accessible instantly from there.

Then I decided to try Fluxbox, a window manager without this problem but otherwise similar to Openbox. There was no predefined package for this, however it was easy to produce another session definition, with Fluxbox in place of Openbox. For some reason that did not work well -- whatever Fluxbox and KDE did on startup, did not mesh well with each other, and once KDE startup was complete, Fluxbox was running, but KDE windows did not end up in the right positions and stacking order. Restarting Fluxbox manually solved that problem, KDE windows went to their proper places, and everything was working perfectly -- until the next reboot when I had to restart Fluxbox manually again. Strangely enough, logging out and in again did not produce the problem, even though no running processes of the logged in user were kept from the old session, and X server was restarted as well. Either, the problem was with timing, or some relevant state, was kept in files, and became invalid on reboot. I did not investigate this further, though I believe, it will make sense for Fluxbox developers to look there.

And then I got an idea -- why am I trying to change the window manager in the first place? KWin can run as a non-compositing window manager, so why don't I simply disable compositing in KDE configuration (that only affects KWin), and run a compositor just like I did it with Openbox? I was not sure if it will work -- after all, KWin may have an assumption that only its internal implementation of compositor can be used with it. I have left Compton startup entry enabled, switched back to the "normal" KDE session, and disabled all compositing in KDE configuration.

That seems to be the optimal solution now. KDE behaves normally as long as I don't try to re-enable its compositing. If I do so, it switches to the problematic mechanism, and I have to disable it again and re-start Compton, however there is no need to do that -- Compton works as a compositor, KWin works as a window manager, driver never shows any errors, and there are no problems on the screen.

The original problem is still unsolved, I have no idea why there are errors with KWin compositor and not with Compton. It is not even clear, what is at fault, NVIDIA driver or KWin.

Openbox can benefit from a configuration option that will enable desktop switch on window selection -- then it would be easier to use with KDE, and it will be a better alternative for whatever situations when KWin does not do what the users want.

Fluxbox and KDE startup somehow ran into a race condition, and it would make sense to investigate this further, this combination works well as long as both are started separately.

However as far as the particular problem of broken desktop behavior is concerned, the workaround is sufficient. If anyone has the same combination of recent KDE and NVIDIA hardware, and by the time he is reading this, the problem still exists, the solution is:
  1. Install Compton package ("x11-misc/compton" on Gentoo, "compton" on Debian and Ubuntu).
  2. In KDE System Settings, Display and Monitor, Compositor section, uncheck "Enable compositor on startup".
  3. In the same KDE System Settings, Startup and Shutdown, add a startup entry for "compton -b"
  4. Edit $HOME/.config/compton.conf to include:
    backend = "glx";
    vsync = "opengl-swc";
  5. Log out and log in again, to verify that startup settings work properly.
    ps x|grep compton|grep -v grep
    should return a line that ends in "compton -b"
  6. (optional) From KDE session, edit $HOME/.config/compton.conf, then from a terminal, restart Compton with:
    killall compton ; compton -b
    until the settings for compositing effects produce the desired results.

Update (2018-02-15): It looks like the problem was fixed in KDE Plasma 5.12.0 -- I have not seen it after upgrading to 5.12.0 (later 5.12.1) and reverting from Compton to KWin compositing. I am not entirely sure about it because the problem was intermittent to begin with, and it may be that the new version simply doesn't hit the conditions that trigger it as often as the old one did. However so far all windows are displayed properly, and no "XID 31" log messages.

Update (2018-02-18): 5.12.1 still managed to produce an error:

[721975.776488] NVRM: Xid (PCI:0000:07:00): 31, Ch 00000020, engmask 00000101, intr 10000000

So the combination of KWin compositor and NVIDIA is still broken. Switching back to Compton.

linux, hardware, software

Previous post Next post
Up