Discussion:
[Bug 93746] New: Black screen when reconnecting display with DRI3
b***@freedesktop.org
2016-01-17 21:05:07 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

Bug ID: 93746
Summary: Black screen when reconnecting display with DRI3
Product: xorg
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: Driver/Radeon
Assignee: xorg-driver-***@lists.x.org
Reporter: ***@bernd-steinhauser.de
QA Contact: xorg-***@lists.x.org

One of my screens (Eizo EV2455) has an odd behaviour: When switched off, it
completely kills the connection, which is very annoying, but that's not the
issue here.

However, when I turn it on again, I'm presented with a black screen on that
screen only (the other two work fine).
I can see the mouse pointer, I can see it changing shape when over a text
field, but the window itself is not visible, its hidden behind the blackness.

The issue goes away when I set
Option "DRI" "2"

Please see
https://bugs.kde.org/show_bug.cgi?id=357988
as well.

GPU is an AMD Kaveri. Kernel is 4.4.0. Mesa is currently scm, but I've seen
this with 10.x as well. xf86-video-ati is 7.6.1.

What other info do you need?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-18 15:26:31 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #1 from Alex Deucher <***@gmail.com> ---
Please attach your xorg log and dmesg output.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-21 17:21:03 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #2 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
Created attachment 121189
--> https://bugs.freedesktop.org/attachment.cgi?id=121189&action=edit
dmesg output

Output by dmesg up to the point where the screen was switched off and on again,
including a VT change (working around the problem) afterwards.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-21 17:21:35 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #3 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
Created attachment 121190
--> https://bugs.freedesktop.org/attachment.cgi?id=121190&action=edit
Xorg log

Corresponding X log.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-21 17:22:34 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #4 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
(In reply to Bernd Steinhauser from comment #0)
Post by b***@freedesktop.org
The issue goes away when I set
Option "DRI" "2"
I think I have to revert myself here. I've seen this with DRI2 as well. I have
yet to find out what changed.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-27 07:32:07 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #5 from Michel Dänzer <***@daenzer.net> ---
Does this only happen with Option "TearFree"? Only with a 4.4 kernel?

It sounds like it could be the problem discussed in
http://lists.freedesktop.org/archives/dri-devel/2016-January/098823.html and
followups.

P.S. How does this relate to bug 90987?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-27 07:43:42 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #6 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
(In reply to Michel Dänzer from comment #5)
Post by b***@freedesktop.org
Does this only happen with Option "TearFree"? Only with a 4.4 kernel?
No, it doesn't seem to be related to the Pageflip option either.
Post by b***@freedesktop.org
It sounds like it could be the problem discussed in
http://lists.freedesktop.org/archives/dri-devel/2016-January/098823.html and
followups.
Could be, yes. However, I think that I had problems with linux 4.2 as well. 4.3
for sure. I bought this screen (the Eizo) in September and since 4.2 came out
in August, this seems likely.
I will try it out, though during my next test run tomorrow.
Post by b***@freedesktop.org
P.S. How does this relate to bug 90987?
Not sure. The difference is, that in this one, only that single screen
misbehaves, in bug 90987, all of them do.
I can reproduce both individually.
Also, while this one is fixable by switching to VT2 and back, bug 90987 seems
only fixable by restarting X (or at least I didn't find another way yet).
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-28 18:51:43 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #7 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
I'm not a 100% sure, but quite certain, that I couldn't reproduce this bug with
kernel 4.2.5, so I think it's at least related to the linked email.

I also was able to reproduce this in such a way, that all screens go black.
From the description of the option, I think that this was because I didn't have
TearFree set.

So I'm not sure anymore if it is really a separate bug or a dupe of 90987.
Could be that the options just change the buffer behavior in such a way that it
can either be fixed by switching to VT2 or not.
If it's the same, I would have been able to reproduce with 4.2.5, with the
effect described in bug 90987.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-28 18:52:04 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

Bernd Steinhauser <***@bernd-steinhauser.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|Black screen when |Black screen when
|reconnecting display with |reconnecting display
|DRI3 |
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-29 02:34:19 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #8 from Michel Dänzer <***@daenzer.net> ---
(In reply to Bernd Steinhauser from comment #7)
Post by b***@freedesktop.org
I'm not a 100% sure, but quite certain, that I couldn't reproduce this bug
with kernel 4.2.5, so I think it's at least related to the linked email.
The problem referenced there was only introduced in 4.4, so if you can
reproduce this with 4.3, it's probably a different issue; can you bisect in
that case?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-01-29 06:22:15 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #9 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
(In reply to Michel Dänzer from comment #8)
Post by b***@freedesktop.org
(In reply to Bernd Steinhauser from comment #7)
Post by b***@freedesktop.org
I'm not a 100% sure, but quite certain, that I couldn't reproduce this bug
with kernel 4.2.5, so I think it's at least related to the linked email.
The problem referenced there was only introduced in 4.4, so if you can
reproduce this with 4.3, it's probably a different issue; can you bisect in
that case?
Will try, but it could take a while, don't know if I find time for that during
the weekend.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-07 10:22:39 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #10 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
Ok, I'm pretty sure I tracked it down:

commit 5b5561b3660db734652fbd02b4b6cbe00434d96b
Author: Mario Kleiner
Date: Wed Nov 25 20:14:31 2015 +0100

drm/radeon: Fixup hw vblank counter/ts for new drm_update_vblank_count()
(v2)


Obviously the same (or similar) commit did go into amdgpu which I didn't test
as it is marked experimental for Kaveri.

The rev before that didn't show the issue, with that rev, it occured.

To be sure I also tested 4.4.1 vanilla and with the commit reverted.
For vanilla reconnecting the screen resulted in a black screen as described.
With the patch reverted, it did not happen.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-07 18:33:43 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #11 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
Ok, I compiled a kernel with amdgpu enabled and radeon disabled.
ddx is now xf86-video-amdgpu-scm.

So far I have not seem this issue with amdgpu, even though the corresponding
version of that commit is still applied.
(commit 8e36f9d33c134d5c6448ad65b423a9fd94e045cf)

So at least it doesn't happen as often as with radeon (during my last tests,
I've seen it every time, but I am not sure it happens always).
I will try it for a couple of days to see if it does occur or not.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-07 18:34:49 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #12 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
Should note, I set the same options for amdgpu as for radeon:
Section "Device"
Identifier "AMDGPU"
Driver "amdgpu"
Option "TearFree" "On"
Option "DRI" "3"
EndSection
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-08 01:50:55 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #13 from Mario Kleiner <***@tuebingen.mpg.de> ---
(In reply to Bernd Steinhauser from comment #12)
Post by b***@freedesktop.org
Section "Device"
Identifier "AMDGPU"
Driver "amdgpu"
Option "TearFree" "On"
Option "DRI" "3"
EndSection
Hi,

i just cc'ed you on a series of patches with fixes for Linux 4.4 and later. Can
you try if applying those on top of 4.4 helps? At least it should remove the
problems discussed in
http://lists.freedesktop.org/archives/dri-devel/2016-January/098823.html and
maybe this is related.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-17 23:05:11 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #15 from Mario Kleiner <***@tuebingen.mpg.de> ---
Created attachment 121821
--> https://bugs.freedesktop.org/attachment.cgi?id=121821&action=edit
First proposed patch to fix this on radeon-kms
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-17 23:03:58 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

Mario Kleiner <***@tuebingen.mpg.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@tuebingen.mpg
| |.de

--- Comment #14 from Mario Kleiner <***@tuebingen.mpg.de> ---
Documenting some progress made "outside" the bug reporter.

So attached a proposed patch. And a trimmed and annotated syslog output from
Bernd, can you apply the attached patch on top of the others and test
with
drm.debug=35 again, and ideally provide syslog output with the
microsecond
timestamps included? This adds some debug statements to see if
something goes
wrong in radeon_flip_work_func on radeon-kms.
Feb 16 21:21:17.870174 orionis systemd-journald[618]: Missed 3 kernel
messages
The log is spammed with these messages. So I did some research and was
pointed to the kernel parameter log_buf_leng, which I set to 16M
(actually tried values up to 128M), but that didn't help.
I'll attach the log anyway, but if you tell me how to get rid of that
issue, I can of course give it another go.
For journald, I set RateLimitBurst=0, which should prevent it from not
accepting messages due to the log spam.
I also did another experiment. The Dell U2415 is normally connected to
HDMI-0.
I connected that to DP-0 (with the Eizo disconnected) and found out,
that now the Dell shows the same behavior as the Eizo. When disconnected
and reconnected, the screen will be black.
If I turn on the DP1.2 setting for the Dell, it will do so even when
switched off, most likely because the connection is cut with DP1.2
turned on (really really annoying behavior, that's also the reason why I
have so many problems with the Eizo, but for that DP1.2 cannot be
switched off).
Thus, it doesn't seem like the problem is related to specific display
hardware.
If you'd like, I can test the HP ZR24W as well (normally connected to DVI).
Best Regards,
Bernd
Thanks, that one was useful. Can you revert the debug patch i sent you last and
instead apply the attached patch and retest? Maybe also remove the patch
"[PATCH 1/2] drm/radeon: Make vbl counter/ts queries robust against dpms
on/off. [RFC]", so your tree corresponds more closely to what is in 4.4/4.5rc.

This one is a proposed fix for the problem, also for stable 4.4.

Seems to be that when the connection gets cut while a pageflip gets queued by
the userspace driver, the radeon-kms driver does DPMS OFF as part of its
hotplug work function -- apparently only for DP displays, if i understand the
code correctly? Then when radeon_flip_work_func() executes the "wait until real
start of vblank" code introduced in Linux 4.4 to fix other regressions, that
code executes while the display engine is already disabled and the scanout is
no longer moving. That leads the wait code to go into an infinite loop - hence
the huge amount of messages in your syslog - flip_work_func hangs -> pageflip
hangs -> game over.

So the attached patch should make that new wait code robust against such
unexpected things as dpms off/on in parallel. Maybe we could manage to also get
this into 4.5-rc5, now that the other vblank fixes have landed in Linus tree.

I don't know if it is expected behavior that pageflips can be queued by
userspace while the display is disabled, or if the ddx shouldn't already
prevent that? The log suggests that the ddx got the hot(un)plug event before it
tried to page flip anyway in TearFree.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-17 23:09:28 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #16 from Mario Kleiner <***@tuebingen.mpg.de> ---
Created attachment 121822
--> https://bugs.freedesktop.org/attachment.cgi?id=121822&action=edit
Annotated and trimmed kernel log which points to the cause of the problem.

See "-->" for key events.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-18 04:31:49 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #17 from Bernd Steinhauser <***@bernd-steinhauser.de> ---
(In reply to Mario Kleiner from comment #15)
Created attachment 121821 [details] [review]
First proposed patch to fix this on radeon-kms
This does work, I no longer observe the problem, thanks.

Maybe the reporter of bug 90987 should test the patch as well as after all it
wasn't clear if the bug was the same or different (or maybe the patch fixes
both bugs).

(I know that this was a regression from 4.4, but maybe there was a different
way to trigger this before.)
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-19 00:06:15 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

--- Comment #19 from Mario Kleiner <***@tuebingen.mpg.de> ---
Created attachment 121835
--> https://bugs.freedesktop.org/attachment.cgi?id=121835&action=edit
Port of the radeon-kms patch v2 to amdgpu

Identical patch for amdgpu.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2016-02-19 00:04:58 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

Mario Kleiner <***@tuebingen.mpg.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #121821|0 |1
is obsolete| |

--- Comment #18 from Mario Kleiner <***@tuebingen.mpg.de> ---
Created attachment 121834
--> https://bugs.freedesktop.org/attachment.cgi?id=121834&action=edit
Patch for fix on radeon-kms (v2) reviewed and tested.

Final patch for Linux 4.4 stable and later.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freedesktop.org
2018-07-10 14:43:18 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93746

Michel DÀnzer <***@daenzer.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
QA Contact|xorg-***@lists.x.org |
Component|Driver/Radeon |DRM/Radeon
Product|xorg |DRI
Assignee|xorg-driver-***@lists.x.org |dri-***@lists.freedesktop
| |.org
Status|NEW |RESOLVED

--- Comment #20 from Michel DÀnzer <***@daenzer.net> ---
Thanks for the report. Resolving, as Mario's fixes landed long ago.
--
You are receiving this mail because:
You are the assignee for the bug.
Loading...