Opened 9 years ago

Last modified 2 weeks ago

#12886 accepted defect

crash in wxSocket_GDK_Input()

Reported by: ruslanch Owned by: vadz
Priority: high Milestone: 3.2.0
Component: network Version: stable-latest
Keywords: wxSocketBase, thread, OnReadWaiting Cc: cafedetal@…
Blocked By: Blocking:
Patch: no

Description

My server uses wxSocketBase in the secondary thread and get pure virtual method call, or wxWidgets/src/unix/sockunix.cpp (144): assert "m_fd! = INVALID_SOCKET" failed in OnReadWaiting(): invalid socket ready for reading? at the second or third connection from the client. The minimal code that demonstrates the problem:

wxThread::ExitCode ClientThread::Entry()
{
   wxMemoryBuffer membuf(4096);

   while ( !TestDestroy() )
   {
      if ( m_socket->WaitForRead(0, 500) )
      {
         m_socket->Read(membuf.GetData(), membuf.GetBufSize());
         if ( m_socket->LastError() != wxSOCKET_NOERROR )
            break;
      }
   }

   m_socket->Destroy();
   m_socket = NULL;

   return NULL;
}

The client sends a message to the server and closes its socket. Bug is not reproduced in the console, because it uses another implementation wxFDIOManager. I use wxWidgets from trunk, Linux 2.6.36, glib 2.26.1, gtk+ 2.22.1.

Attachments (5)

main.cpp download (1.9 KB) - added by ruslanch 9 years ago.
Minimal example
test.py download (201 bytes) - added by ruslanch 9 years ago.
Test
fix-socket-events.patch download (4.8 KB) - added by chrisstankevitz 8 years ago.
do not reenable events if they were not enabled in the first place
client.cpp download (1.7 KB) - added by chrisstankevitz 8 years ago.
Example of an application that does not use wxSOCKET_BLOCK that does have disabled events that are subsequently re-enabled by wxSocketReaddGuard::~wxSocketReaddGuard
gtk-socket-thread-safe-hack.patch download (2.6 KB) - added by chrisstankevitz 8 years ago.
updated patch that does not change public interface. Makes only unix NOWAIT Notify(false) calls thread-safe.

Download all attachments as: .zip

Change History (24)

comment:1 Changed 9 years ago by ruslanch

The minimal code that demonstrates the problem:

wxThread::ExitCode ClientThread::Entry()
{
   wxByte buf[1024];
   m_socket->Read(buf, sizeof buf);
   m_socket->Destroy();
   m_socket = NULL;

   return NULL;
}

comment:2 Changed 9 years ago by vadz

  • Summary changed from crush in wxSocket_GDK_Input() to crash in wxSocket_GDK_Input()

How exactly is m_socket created? It would be really great to have a patch against the minimal sample if possible. TIA!

Changed 9 years ago by ruslanch

Minimal example

Changed 9 years ago by ruslanch

Test

comment:3 Changed 9 years ago by ruslanch

Example falls in wxSocket_GDK_Input():

pure virtual method called
terminate called without an active exception
[Thread 0x7fffed57c700 (LWP 4847) exited]

Program received signal SIGABRT, Aborted.
0x00007ffff525b655 in raise () from /lib/libc.so.6
(gdb) where
#0  0x00007ffff525b655 in raise () from /lib/libc.so.6
#1  0x00007ffff525cad6 in abort () from /lib/libc.so.6
#2  0x00007ffff5cfef7d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/libstdc++.so.6
#3  0x00007ffff5cfd196 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007ffff5cfd1c3 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007ffff5cfdb1f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#6  0x00007ffff6b7a8ab in wxSocket_GDK_Input (data=0x7fffe40010f8, 
    condition=GDK_INPUT_READ) at ../wxWidgets/src/gtk/sockgtk.cpp:32
#7  0x00007ffff496718f in ?? () from /usr/lib/libgdk-x11-2.0.so.0
#8  0x00007ffff26c6bf3 in g_main_context_dispatch ()
   from /usr/lib/libglib-2.0.so.0
#9  0x00007ffff26c73d0 in ?? () from /usr/lib/libglib-2.0.so.0
#10 0x00007ffff26c7a42 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#11 0x00007ffff4d27917 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#12 0x00007ffff6b74f40 in wxGUIEventLoop::Run (this=0x715d60)
    at ../wxWidgets/src/gtk/evtloop.cpp:60
#13 0x00007ffff601c235 in wxAppConsoleBase::MainLoop (this=0x675460)
    at ../wxWidgets/src/common/appbase.cpp:314
#14 0x00007ffff601c05f in wxAppConsoleBase::OnRun (this=0x675460)
    at ../wxWidgets/src/common/appbase.cpp:255
#15 0x00007ffff6c1a10b in wxAppBase::OnRun (this=0x675460)
    at ../wxWidgets/src/common/appcmn.cpp:284
---Type <return> to continue, or q <return> to quit---
#16 0x00007ffff60940d4 in wxEntry (argc=@0x7ffff642ae70, argv=0x66b750)
    at ../wxWidgets/src/common/init.cpp:472
#17 0x00007ffff6094194 in wxEntry (argc=@0x7fffffffe0ec, argv=0x7fffffffe1d8)
    at ../wxWidgets/src/common/init.cpp:484
#18 0x0000000000405566 in main (argc=1, argv=0x7fffffffe1d8) at main.cpp:30

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6b7a89f in wxSocket_GDK_Input (data=0x716998, 
    condition=GDK_INPUT_READ) at ../wxWidgets/src/gtk/sockgtk.cpp:32
32	        handler->OnReadWaiting();
(gdb) where
#0  0x00007ffff6b7a89f in wxSocket_GDK_Input (data=0x716998, 
    condition=GDK_INPUT_READ) at ../wxWidgets/src/gtk/sockgtk.cpp:32
#1  0x00007ffff496718f in ?? () from /usr/lib/libgdk-x11-2.0.so.0
#2  0x00007ffff26c6bf3 in g_main_context_dispatch ()
   from /usr/lib/libglib-2.0.so.0
#3  0x00007ffff26c73d0 in ?? () from /usr/lib/libglib-2.0.so.0
#4  0x00007ffff26c7a42 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#5  0x00007ffff4d27917 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#6  0x00007ffff6b74f40 in wxGUIEventLoop::Run (this=0x715d60)
    at ../wxWidgets/src/gtk/evtloop.cpp:60
#7  0x00007ffff601c235 in wxAppConsoleBase::MainLoop (this=0x675460)
    at ../wxWidgets/src/common/appbase.cpp:314
#8  0x00007ffff601c05f in wxAppConsoleBase::OnRun (this=0x675460)
    at ../wxWidgets/src/common/appbase.cpp:255
#9  0x00007ffff6c1a10b in wxAppBase::OnRun (this=0x675460)
    at ../wxWidgets/src/common/appcmn.cpp:284
#10 0x00007ffff60940d4 in wxEntry (argc=@0x7ffff642ae70, argv=0x66b750)
    at ../wxWidgets/src/common/init.cpp:472
#11 0x00007ffff6094194 in wxEntry (argc=@0x7fffffffe0ec, argv=0x7fffffffe1d8)
    at ../wxWidgets/src/common/init.cpp:484
#12 0x0000000000405566 in main (argc=1, argv=0x7fffffffe1d8) at main.cpp:30

comment:4 Changed 8 years ago by vadz

  • Cc cafedetal@… added
  • Milestone set to 3.0
  • Status changed from new to confirmed

I can confirm the bug, we get a call to wxSocket_Input() after (or sometimes during, this is a race condition between two threads so anything is possible) the socket was destroyed.

I don't think this can be prevented except by artificially maintaining the socket (or at least the corresponding wxSocketImpl) alive until we get the last notification for it (about the socket closure?) but the main problem is not this but rather why do we use all this event loop related stuff at all when the sockets are supposed to be blocking. Reading the code again it just doesn't make sense to me that we do all this complicated stuff when enabling, disabling, blocking and unblocking events for the sockets which are not supposed to use them at all.

So I guess the fix for this problem would be to add tests for wxSOCKET_BLOCK to wxSocketImplUnix::EnableEvents() and UnblockAndRegisterWithEventLoop() (or maybe the code calling them?) and simply do nothing in this case. I hope I'm not missing something obvious but there doesn't seem to be any purpose to using them in wxSOCKET_BLOCK case...

Changed 8 years ago by chrisstankevitz

do not reenable events if they were not enabled in the first place

comment:5 Changed 8 years ago by chrisstankevitz

  • Patch set

I believe I fixed this:

wxSocketReadGuard and wxSocketWriteGuard ReenableEvents in their destructors whether the events should be or not. This patch causes the guards to ReenableEvents only if they were enabled in the first place.

comment:6 Changed 8 years ago by chrisstankevitz

Note: you should not call wxSockImpl::IsEventEnabled with more than one flag:

int flags = FLAG1 | FLAG2 | FLAG3;

// FLAG2 is enabled, FLAG1 and FLAG3 are disabled

bool reenable = pImpl->IsEventEnalbled(flags); // returns true for FLAG2

DisableEvents(flags);

// do work

if (reenable)
{
  ReeenableEvents(flags); // erroneously set FLAG1 and FLAG3 high
}

This is probably a sign that my patch is not an ideal fix.

comment:7 Changed 8 years ago by vadz

Thanks, I'll apply this patch before 3.0 if no better solution appears but I still think that, as I wrote in comment:4, we shouldn't be using the events (nor enabling/disabling them) at all when using wxSOCKET_BLOCK.

Do you have a test case where non-blocking sockets are used and which shows this problem? If so, when exactly do we instantiate wxSocketReadGuard when input events are currently disabled?

Changed 8 years ago by chrisstankevitz

Example of an application that does not use wxSOCKET_BLOCK that does have disabled events that are subsequently re-enabled by wxSocketReaddGuard::~wxSocketReaddGuard

comment:8 follow-up: Changed 8 years ago by chrisstankevitz

Vadim,

The attached file, client.cpp, implements a simple "echo client" that sends a message every 500ms to an "echo server."

If you run it (with an echo server) for a few seconds, *then after a few seconds* set a breakpoint in wxSocketReadGuard::~wxSocketReadGuard, you will find that the guard uses wxSocketImpl::ReenableEvents to turn on events that were not enabled within the corresponding wxSocketReadGuard::wxSocketReadGuard.

You will notice that in the first few iterations (5 or so), events were enabled in the wxSocketReadGuard constructor and were "correctly" re-enabled by the destructor. However, after some time, the events are disabled (I do not know by whom) upon entry into the constructor and then are subsequently re-enabled by the guard (without my patch) or left disabled (with my patch). This is easier to discover with my wxSockImpl patch as it provides wxSocketImpl::IsEventEnabled.

The mystery I have not solved (and did not even know until you asked for this test case) is: At what point are the events "disabled" in such a way as to cause my patch to be useful. Here are some guesses:

  1. There is some intelligent asynchronous process that understands events are not needed and disables them, but only after *a few seconds*.
  1. There is an error with my implementation of wxSocketImpl::IsEventEnabled and events are reported as disabled eventhough they are enabled.
  1. There is an error in wxWidgets in which a wxSocketReadGuard destructor is not executed and hence does not reenable events.

Chris

comment:9 Changed 8 years ago by chrisstankevitz

Note that client.cpp calls wxSocketBase::SetFlags(wxSOCKET_NOWAIT) inside the connection callback OnSocketEvent/wxSOCKET_CONNECTION.

comment:10 in reply to: ↑ 8 Changed 8 years ago by chrisstankevitz

Vadim,

After snooping around, I do not think you should apply my patch. I think it is exploiting a "logic error" in socket or something else that I do not understand. It might even break "normal" wxSocketEvent-based behavior.

These are the things that confuse me:

  1. wxSocket*Guard enables GTK callbacks in the destructor without checking whether or not they were enabled in the constructor or whether they should be enabled.
  1. On*Waiting disables GTK callbacks on entry, but it is not clear to me how/if/when they are re-enabled, if they need to be re-enabled.
  1. Code that enables/disables the GTK callbacks should be considering:
  • the state of wxSocketBase::GetFlags
  • the state of wxSocketBase::Notify
  • the state of wxSocketBase::SetNotify
  1. SetFlags, Notify, and SetNotify should be enabling/disabling events.
  1. I'm not sure what should happen if someone sets flags/notifies to disable events and then calls wxSocketBase::Wait. Perhaps those cases should assert (or perhaps they do already).
  1. I do not (yet) know or understand wxSocketManager
  1. I do not understand why Vadim says "events should be disabled when flags are set to wxSOCKET_BLOCK." Even with wxSOCKET_BLOCK, the user might still have Notify(true) and might still expect to receive wxSocketEvents. I'm not sure how this can happen if events are disabled.
  1. An example where GTK callbacks and wxSocketEvents should be disabled: Notify is false and flags are set to wxSOCKET_NOWAIT.

I'm going to keep snooping around. This is a problem for me and I will need to hack something together to get my application working properly.

Chris

Changed 8 years ago by chrisstankevitz

updated patch that does not change public interface. Makes only unix NOWAIT Notify(false) calls thread-safe.

comment:11 Changed 8 years ago by vadz

  • Patch unset

Sorry, I just don't understand the logic of this patch at all :-( Does "thread safe" here means "used from a separate thread"? In any case, why is it only "thread safe" if it uses wxSOCKET_NOWAIT (and doesn't use notifications), shouldn't the sockets using other flags from another thread not use events neither? Due to all these questions I'm afraid I can't apply this patch right now, I just don't understand what does it do.

I'm confused by the overall design of this code too but let me try to answer some of your questions from comment:10:

  1. The idea is that normally the notifications are always enabled "normally". Under Unix they're disabled by On{Read,Write}Waiting() so must be reenabled after reading or writing anything. IOW I think this code is OK.
  2. As explained above, this is the other half of (1).
  3. Not quite. Notify(false) disables the generation of wxSocketEvents for the socket but I don't think we can disable GTK callbacks as this would break sockets use completely, whether we want to have events for them or not. SetNotify() is similar but allows to only get wx events for read or for write and not for both of them. However GetFlags() is different.
  4. I don't think Notify() and SetNotify() should, as explained above. But SetFlags() would do it when switching to blocking mode and enable them back when switching to async again.
  5. I don't think Wait() is related to wx events. But I could be missing something.
  6. wxSocketManager is just a way to virtualize the behaviour during run-time. All the socket code is in net library which is the same in both console and GUI applications, so we want to do the same things in socket code but they must behave differently in these 2 cases. The manager is just an extra level of indirection allowing this.
  7. I don't think it makes sense to have events for blocking sockets -- when would you get them? If you block on the socket anyhow, event handler can't be executed. Also, blocking sockets should be only used in non-main thread in the GUI apps so the events from them wouldn't be very useful anyhow as they would be processed in a different thread. So I think I was right here (for once...).
  8. To return to what I started with, I simply don't understand why is it so...

Could we please discuss this on wx-dev? Hopefully an interactive discussion (we could also do it on #wxwidgets for even more interactivity) would be more helpful than using Trac.

comment:12 Changed 8 years ago by vadz

Please see this thread for the continuation of the discussion.

comment:13 Changed 6 years ago by vadz

  • Milestone changed from 3.0 to 3.2

Unfortunately I won't have time to deal with this before 3.0. Fixing it by just making blocking wxSockets really blocking would still be very welcome. I'd like to avoid any bigger modifications to this code however, especially in the close future.

comment:14 Changed 6 years ago by vadz

See also #10720.

comment:15 Changed 6 years ago by vadz

  • Priority changed from normal to high

This might be related to a similar problem under MSW (and maybe also this one?).

comment:16 Changed 4 years ago by dandi

Recently I came across this bug or something that seems to be related to this issue.

My issue looks the same: I have GUI and a worker thread that uses wxHTTP. After running the worker thread several times, it crashes (sometimes with SEG_FAULT, sometimes with "invalid socket" as in #10720).

I was using 3.0.2 from debian repositories. However, when I link my project against the latest version from git, everything seems to work fine, at least for the first ten runs. So I went further and wrote a simple test with GUI and a worker thread that creates, uses, and destroys wxHTTP in a loop. This test shows the same behavior as my big project: being linked against 3.0.2, it crashes during the first ten iterations, but if linked against 3.1.1, it runs perfectly for the first few thousands iterations (I didn't have patience to run it longer).

Were there any changes related to this issue? Maybe this ticket and related #10720 can be marked as solved?

Version 0, edited 4 years ago by dandi (next)

comment:17 Changed 2 weeks ago by vadz

  • Owner set to vadz
  • Status changed from confirmed to accepted

This one is different from #10720 as it starts with a non-blocking socket and then calls SetFlags(wxSOCKET_BLOCK) on it later. This should work because it's the only way to make a server socket blocking, but currently doesn't.

Let me try to finally fix this...

comment:18 Changed 2 weeks ago by vadz

I've created this PR which fixes the problem with the original test case from the comment:3, i.e. the code which made the socket blocking before passing it to the worker thread to handle it. I'm pretty sure that this PR does fix this particular problem as it was perfectly reproducible but now the test Python script ends without crashing the server and ASAN doesn't have anything to say neither.

I'm not sure if this is the only problem in this code, however. This definitely doesn't fix the case of non-blocking sockets, but I don't think we can do anything about them anyhow -- they just can't be used from the other threads (maybe we could assert more aggressively in this case). I'm also not sure if it fixes the problem from the comment:8 nor the problem reported in aMule with wx 3.x. It would be great if new tickets, with a way to reproduce the problem with the latest master, could be opened for them as the problem of this one was really just due to not handling switching the sockets returned from Accept() to blocking mode correctly and, again, I'm quite sure this one is fixed now.

TIA for your testing!

comment:19 Changed 2 weeks ago by vadz

FWIW using client.cpp from comment:8 with this simple echo server (adjusting the port number to 4040, of course) doesn't show any problems neither with nor without this PR, i.e. it works with unmodified 3.1.3 too.

Note: See TracTickets for help on using tickets.