Opened 5 years ago

Closed 4 years ago

Last modified 2 years ago

#11528 closed defect (fixed)

wxMSW: socket-based IPC hangs

Reported by: alarsen Owned by: vadz
Priority: normal Milestone: 2.9.1
Component: network Version: stable-latest
Keywords: wxMSW, wxSocket, IPC Cc:
Blocked By: Blocking:
Patch: no

Description

In a wxMSW release-build IPC stops working after the exchange of a few large messages; a debug-build asserts:
src/common/socket.cpp(207): assert "!m_socket->m_reading" failed in wxSocketReadGuard(): read reentrancy?

The assert can be reproduced using samples/ipc/ipc{server,client}.exe (compiled with wxUSE_DDE_FOR_IPC=0) and clicking "Request" a few times.
(The samples continue to work afterwards, perhaps due to the short messages exchanged.)

Tracing the internals with some strategic wxLogMessage() revealed that the following happens (in order):

wxTCPEventHandler::Client_OnRequest() calls
wxSocketBase::Read() which calls
wxSocketBase::DoRead() which calls
wxSocketBase::DoWait() which calls
wxEventLoopBase::GetActive()->DispatchTimeout();

from the event-loop

wxTCPEventHandler::Client_OnRequest() is called again on the same connection leading to
wxSocketBase::Read() being called recursively.

AFAICT the bug was introduced with r61488

I haven't seen this problem on Linux (despite wxIPC being used 24/7 on that platform)

Attachments (5)

ipcclient.patch download (627 bytes) - added by alarsen 5 years ago.
wxIdleEvent-MSW.patch download (717 bytes) - added by alarsen 5 years ago.
Fix the console event-loop on MSW to emit wxIdleEvent
0001-wxSocketBase-IsOk.patch download (818 bytes) - added by alarsen 5 years ago.
Let wxSocketBase::IsOk() return false if the socket is being deleted.
0002-wxSocketBase-Peek.patch download (777 bytes) - added by alarsen 5 years ago.
make wxSocketBase::Peek() behave intuitively
0003-wxTCPEventHandler.patch download (2.1 KB) - added by alarsen 5 years ago.
Fix wxTCPEventHandler to properly check for a valid connection.

Download all attachments as: .zip

Change History (34)

comment:1 Changed 5 years ago by vadz

  • Status changed from new to accepted

Thanks for debugging this, will try to have a look soon.

comment:2 follow-up: Changed 5 years ago by vadz

I do see the bug but AFAICS it's a bug in the sample which doesn't protect itself against reentrancy: indeed, when MyFrame::OnRequest() calls Request(), it's perfectly possible that another click event is generated from inside this Request() call as it operates in non-blocking mode. And this is what seems to provoke the assert (correctly).

So while this is definitely not nice, it doesn't look like a big problem for any properly written IPC code. What am I missing?

comment:3 in reply to: ↑ 2 ; follow-up: Changed 5 years ago by alarsen

Replying to vadz:

when MyFrame::OnRequest() calls Request(), it's perfectly possible that another click event is generated from inside this Request() call as it operates in non-blocking mode.

That's entirely possible of course, but it doesn't seem to be the cause of my problem:

To rule out that MyFrame::OnRequest() was called recursively I added a wxASSERT to the sample (see ipcclient.patch) and waited for the answer to appear in the log window before clicking request a second time; my wxASSERT is not triggered but the original wxASSERT is still provoked after 2 or 3 clicks.

Changed 5 years ago by alarsen

Changed 5 years ago by alarsen

Fix the console event-loop on MSW to emit wxIdleEvent

comment:4 follow-up: Changed 5 years ago by alarsen

The attached patch wxIdleEvent-MSW.patch fixes the console event-loop to emit wxIdleEvent as expected, so now the console samples can be used to debug the problem, too.

Changed 5 years ago by alarsen

Let wxSocketBase::IsOk() return false if the socket is being deleted.

Changed 5 years ago by alarsen

make wxSocketBase::Peek() behave intuitively

Changed 5 years ago by alarsen

Fix wxTCPEventHandler to properly check for a valid connection.

comment:5 follow-up: Changed 5 years ago by alarsen

  • Patch set

The trivial patches 0001-wxSocketBase-IsOk.patch, 0002-wxSocketBase-Peek.patch and 0003-wxTCPEventHandler.patch seem to fix the original problem.

As it turned out, on MSW we have to cope with spurious wxSOCKET_INPUT events which may arrive when no data is available to read (eventually causing a wxASSERT in wxSocketReadGuard) or even right after the client has closed the connection (if the server answers the disconnect, as happens with the samples).

Cheers
Anders

comment:6 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by vadz

Replying to alarsen:

The attached patch wxIdleEvent-MSW.patch fixes the console event-loop to emit wxIdleEvent as expected, so now the console samples can be used to debug the problem, too.

Anders, thanks a lot for working on this but I wonder why do we need to do it -- normally ProcessIdle() is supposed to be called from the base class Run(). If this doesn't happen it must be because Pending() never returns false in wxMSW wxEventLoop implementation. But if this is the case, then the same problem must exist in the GUI applications too. So maybe this patch would be a better fix?

  • src/msw/evtloop.cpp

    a b wxMSWEventLoopBase::wxMSWEventLoopBase() 
    7474bool wxMSWEventLoopBase::Pending() const 
    7575{ 
    7676    MSG msg; 
    77     return ::PeekMessage(&msg, 0, 0, 0, PM_NOREMOVE) != 0; 
     77    return ::PeekMessage(&msg, 0, 1, UINT_MAX, PM_NOREMOVE) != 0; 
    7878} 
    7979 
    8080bool wxMSWEventLoopBase::GetNextMessage(WXMSG* msg) 

What do you think?

comment:7 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by vadz

Replying to alarsen:

The trivial patches 0001-wxSocketBase-IsOk.patch, 0002-wxSocketBase-Peek.patch and 0003-wxTCPEventHandler.patch seem to fix the original problem.

Sorry for nitpicking again but I have questions about these patches. In order:

  1. IsOk() in the other classes just returns whether the object is initialized, I don't think it's 100% obvious that it may return false if the socket is being destroyed. Maybe the code using it should just use IsConnected() instead?
  2. No question about this one, will apply, thanks!
  3. This is the most problematic one as I think that if we really get such phantom input events, then we should filter them out at lower level, I don't see what could they be useful for and other code would probably be just as confused by them as wxIPC. I also would really like to know where do they come from in the first place... I don't think I saw them here although I could have missed them. Anyhow, what do you think about checking if input is really available before generating the event? I.e. do you see some obvious reason this wouldn't work?

Thanks in advance once again!

comment:8 Changed 5 years ago by VZ

(In [63282]) Never block in wxSocket::Peek().

Peek() is not expected to block so ensure that it doesn't, independently of
the currently used socket flags, by using wxSOCKET_NOWAIT.

See #11528.

comment:9 in reply to: ↑ 6 Changed 5 years ago by alarsen

Replying to vadz:

I wonder why do we need to do it -- normally ProcessIdle() is supposed to be called from the base class Run(). If this doesn't happen it must be because Pending() never returns false in wxMSW wxEventLoop implementation.

I have to admit that I don't understand the underlying cause of the problem; my patch was merely the result of a (lucky?) guess.

But if this is the case, then the same problem must exist in the GUI applications too.

GUI applications don't show this problem AFAICT.

So maybe this patch would be a better fix?

I tried it, but unfortunately it does not make any difference (still no idle processing).
Perhaps the OS behaves differently for non-GUI apps?

comment:10 in reply to: ↑ 7 Changed 5 years ago by alarsen

Replying to vadz:

  1. IsOk() in the other classes just returns whether the object is initialized, I don't think it's 100% obvious that it may return false if the socket is being destroyed. Maybe the code using it should just use IsConnected() instead?

OK, let's just drop that patch. I've come to the conclusion that it solves a non-existent problem anyway.

  1. No question about this one, will apply, thanks!

Thanks!

  1. This is the most problematic one as I think that if we really get such phantom input events, then we should filter them out at lower level, I don't see what could they be useful for and other code would probably be just as confused by them as wxIPC.

I also would really like to know where do they come from in the first place... I don't think I saw them here although I could have missed them. Anyhow, what do you think about checking if input is really available before generating the event?

You mean something like this?

  • src/msw/sockmsw.cpp

    a b LRESULT CALLBACK wxSocket_Internal_WinProc(HWND hWnd, 
    336336        wxASSERT_MSG( socket->m_fd == (SOCKET)wParam, 
    337337                      "mismatch between message and socket?" ); 
    338338 
     339        char ch; 
    339340        switch WSAGETSELECTEVENT(lParam) 
    340341        { 
    341342            case FD_READ: 
     343                if (recv(socket->m_fd, &ch, 1, MSG_PEEK) <= 0) 
     344                    return 0;    // no data available !?? 
    342345                event = wxSOCKET_INPUT; 
    343346                break; 
    344347 

It solves the problem nicely and is much less intrusive than my proposed patch nr. 3 (and only touches MSW code).
I'm still at a loss whether we're dealing with spurious or stale events, though.

Cheers
Anders

comment:11 follow-up: Changed 4 years ago by Pegasi

Continue of discussion Ticket:12060, duplicationg the current issue.

@alarsen:

Can it be that applied patch comment:ticket:11528:10 calling recv for PEEK initiates a new FD_READ?

Well, that patch was introduced because wxSOCKET_INPUT was already generated more than once for one single message, so how it could be the cause of the problem it's supposed to fix?

Peek in your patch posts a new FD_READ, because there is a data available to read at the moment of your recv (reenabling function) with PEEK call. See MSDN documentation for WSAAsyncSelect()

Any call to the reenabling routine, even one that fails, results in reenabling of message posting for the relevant event.

For not receiving the message atomically(like in the sockets sample and in my app: peek, header, body),

WSAAsyncSelect() states:

If an application issues multiple recv calls in response to a single FD_READ, it can receive multiple FD_READ messages. Such an application can require disabling FD_READ messages before starting the recv calls by calling WSAAsyncSelect with the FD_READ event not set.

sock->SetNotify does not calls WSAAsyncSelect with the FD_READ event not set, a s I understand, so we continue to receive FD_READ.

I believe, wxWidgets should allow to block FD_READ event during not atomically reading the message and a user should re-enable FD_READ before the reading the last piece of the message (message body in my case). If all data read - no event will come, but if there is data available - body read will generate FD_READ.

@vladz: you is the author of the last wxWidgets sockets code - what do you think about the proposal above?

By the way, I did not catch this bug  in the stable release 2.8.11  even if multiple reads are done (in sockets sample for example). IMHO, 3.0 stable release won't be 'stable' enough while the issue discussed here exists.

@alarsen, @vladz : did anybody catch the multiple FD_READ in wxWidgets 2.9.0 and above for the single message sent to the client after applying the patch comment:ticket:11528:10 ?

comment:12 in reply to: ↑ 11 Changed 4 years ago by alarsen

Replying to Pegasi:

If an application issues multiple recv calls in response to a single FD_READ, it can receive multiple FD_READ messages.

That's without doubt the underlying cause of all of our problems!

By the way, I did not catch this bug in the stable release 2.8.11 even if multiple reads are done (in sockets sample for example).

IPC (and wxSocket event handling in particular) has changed substantially in 2.9 (allowing e.g. the use of wxSockets in pure console applications), so that's not really surprising.

IMHO, 3.0 stable release won't be 'stable' enough while the issue discussed here exists.

It's a pure wxMSW problem - I have a wxBase 2.9 deployment on Linux relying heavily on wxIPC; it's been running flawlessly 24/7 for more than two years :)
Much as I'd like to see this issue fixed myself, we have to accept the fact that only a small fraction of the wx user base has any interest in wxIPC at all, so it'll never get a high priority.

Did anybody catch the multiple FD_READ in wxWidgets 2.9.0 and above for the single message sent to the client after applying the patch comment:ticket:11528:10] ?

I haven't noticed it (but extraneous FD_READ was indeed a problem without that patch).

comment:13 follow-up: Changed 4 years ago by vadz

I agree that the problem almost certainly is due to calling recv() multiple times but I still don't see what to do about it. I guess we could simply ignore read reentrancy instead of asserting, but will this really allow the code to work or will it just block waiting for a reply that won't ever come in this case? IOW do we have a real problem with the current sources (i.e. without the patch) or is the only problem that it asserts?

We also could apply the patch but call WSAAsyncSelect() to turn off the notification before calling recv(MSG_PEEK) and turn it back on after doing it. This would be pretty ugly and I'm not even sure if it's going to fix the problem though :-(

My main problem remains that I can't reproduce the bug reliably. Well, to be honest I didn't try since quite some time so I'll try again soon (before 2.9.1...) but it would be really great to be able to do it in the console socket (or ipc) sample. Why doesn't the problem happen here? Pegasi: if you want to help with fixing this bug, creating the smallest possible patch to the basesocket (or baseipc) sample reproducing the bug would be really, really helpful.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 4 years ago by alarsen

Replying to vadz:

I guess we could simply ignore read reentrancy instead of asserting, but will this really allow the code to work or will it just block waiting for a reply that won't ever come in this case?
IOW do we have a real problem with the current sources (i.e. without the patch) or is the only problem that it asserts?

Ignoring the assert causes the application to hang, so the problem is indeed real.

We also could apply the patch but call WSAAsyncSelect() to turn off the notification before calling recv(MSG_PEEK) and turn it back on after doing it. This would be pretty ugly and I'm not even sure if it's going to fix the problem though :-(

We'd have to do it around every recv() but the last one, since we read messages as several parts (ADVISE and POKE are the worst - we call recv() six times to read them) :-/

My main problem remains that I can't reproduce the bug reliably. Well, to be honest I didn't try since quite some time so I'll try again soon (before 2.9.1...) but it would be really great to be able to do it in the console socket (or ipc) sample. Why doesn't the problem happen here?

But indeed it does (at least on WinXP), see comment:4

Pegasi: if you want to help with fixing this bug, creating the smallest possible patch to the basesocket (or baseipc) sample reproducing the bug would be really, really helpful.

comment:15 in reply to: ↑ 14 ; follow-up: Changed 4 years ago by vadz

Replying to alarsen:

Replying to vadz:

I guess we could simply ignore read reentrancy instead of asserting, but will this really allow the code to work or will it just block waiting for a reply that won't ever come in this case?
IOW do we have a real problem with the current sources (i.e. without the patch) or is the only problem that it asserts?

Ignoring the assert causes the application to hang, so the problem is indeed real.

It does this now because the code does reenter Read() after assert currently. I meant that we should simply return immediately when we detect reentrancy.

We also could apply the patch but call WSAAsyncSelect() to turn off the notification before calling recv(MSG_PEEK) and turn it back on after doing it. This would be pretty ugly and I'm not even sure if it's going to fix the problem though :-(

We'd have to do it around every recv() but the last one, since we read messages as several parts (ADVISE and POKE are the worst - we call recv() six times to read them) :-/

And we don't know which one is the last one, do we?

My main problem remains that I can't reproduce the bug reliably. Well, to be honest I didn't try since quite some time so I'll try again soon (before 2.9.1...) but it would be really great to be able to do it in the console socket (or ipc) sample. Why doesn't the problem happen here?

But indeed it does (at least on WinXP), see comment:4

Oops, I've missed this somehow. So what exactly do I need to do to see the problem (I must be doing something wrong if everybody but me sees it...)?

comment:16 in reply to: ↑ 15 ; follow-up: Changed 4 years ago by alarsen

Replying to vadz:

Ignoring the assert causes the application to hang, so the problem is indeed real.

The samples do not hang - real production code does, perhaps because it exchanges larger (more than the network's MTU) messages, or perhaps because it has several connections active at once. Might even be timing-related.

It does this now because the code does reenter Read() after assert currently. I meant that we should simply return immediately when we detect reentrancy.

In that case I think the right place to detect (and prevent) reentrancy would be in
wxTCPEventHandler::Client_OnRequest(), e.g. something like this ugly hack (which to me looks as if it would solve the issue, but in fact does not!???):

  • include/wx/socket.h

    a b private: 
    277277 
    278278    friend class wxSocketReadGuard; 
    279279    friend class wxSocketWriteGuard; 
     280    friend class wxTCPEventHandler; 
    280281 
    281282    wxDECLARE_NO_COPY_CLASS(wxSocketBase); 
    282283    DECLARE_CLASS(wxSocketBase) 
  • src/common/sckipc.cpp

    a b void wxTCPEventHandler::Client_OnRequest(wxSocketEvent &event) 
    724724        return; 
    725725    } 
    726726 
     727    // Recursive call? 
     728    if (sock->m_reading) 
     729        return; 
     730 
    727731    // Receive message number. 
    728732    wxIPCSocketStreams * const streams = connection->m_streams; 
    729733 

(with the above applied, the patch from comment:10 is no longer needed - but as I stated, I still get the reentrancy assert)

We'd have to do it around every recv() but the last one, since we read messages as several parts (ADVISE and POKE are the worst - we call recv() six times to read them) :-/

And we don't know which one is the last one, do we?

In fact we do - it all happens in wxTCPEventHandler::Client_OnRequest()

My main problem remains that I can't reproduce the bug reliably. Well, to be honest I didn't try since quite some time so I'll try again soon (before 2.9.1...) but it would be really great to be able to do it in the console socket (or ipc) sample. Why doesn't the problem happen here?

But indeed it does (at least on WinXP), see comment:4

BTW, the patch wxIdleEvent-MSW.patch mentioned in that comment is still (as of r64327) needed (here) to use the console ipc samples. Without it, nothing happens after they connect.

comment:17 Changed 4 years ago by Pegasi

Replying to vadz:

Pegasi: if you want to help with fixing this bug, creating the smallest possible patch to the basesocket (or baseipc) sample reproducing the bug would be really, really helpful. Oops, I've missed this somehow. So what exactly do I need to do to see the problem (I must be doing something wrong if everybody but me sees it...)?

I do really want to help with fixing this issue. I catch this problem just running GUI client.cpp/server.cpp socket example (not console) in the 2.9.0 (or later). Put a breakpoint in the server.cpp, line 441 (sock->Read(&c, 1);) and send a message from the client. Sooner or later you will see 2 entrances in wxSOCKET_INPUT for one message. I did not try any wxIPC-related code for the bug at all.



I've tried the very ugly fix yesterday, by uninstalling callbacks (wxApp::GetTraits()->GetSocketManager()->Uninstall_Callbacks(mySocket->getSocketImpl())) before the first read in wxSOCKET_INPUT code and installing them back before the last read (message body). This was done without the patch comment:ticket:11528:10]. I've got stale event eventually. Next, I've included a patch and wrapped it in uninstall/install callbacks - this led to the coming of multiple FD_READ with data as if in the issue without any patches. It seems that such approch did not give the desired effect so far. The side effect of callbacks uninstalling/installing is that the FD_WRITE comes (it posts by calling WSAAsyncSelect installation of the callback), but is discarded due to no wxSOCKET_OUTPUT flag presence in the users code.

comment:18 in reply to: ↑ 16 ; follow-up: Changed 4 years ago by Pegasi

  • Patch unset

Replying to vadz:

Pegasi: if you want to help with fixing this bug, creating the smallest possible patch to the basesocket (or baseipc) sample reproducing the bug would be really, really helpful. Oops, I've missed this somehow. So what exactly do I need to do to see the problem (I must be doing something wrong if everybody but me sees it...)?

I do really want to help with fixing this issue. I catch this problem just running GUI client.cpp/server.cpp socket example (not console) in the 2.9.0 (or later). Put a breakpoint in the server.cpp, line 441 (sock->Read(&c, 1);) and send a message from the client. Sooner or later you will see 2 entrances in wxSOCKET_INPUT for one message.

I've tried the very ugly fix yesterday, by uninstalling callbacks (wxApp::GetTraits()->GetSocketManager()->Uninstall_Callbacks(mySocket->getSocketImpl())) before the first read in wxSOCKET_INPUT code and installing them back before the last read (message body). This was done without the patch comment:ticket:11528:10]. I've got stale event eventually. Next, I've included a patch and wrapped it in uninstall/install callbacks - this led to the coming of multiple FD_READ with data as if in the issue without any patches. It seems that such approch did not give the desired effect so far. The side effect of callbacks uninstalling/installing is that the FD_WRITE comes (it posts by calling WSAAsyncSelect installation of the callback), but is discarded due to no wxSOCKET_OUTPUT flag presence in the users code.

@alarsen:

Actually, I do not use any wxIPC-related code at all and have this issue. Does sockets_vc8_client.vcproj or sockets_vc8_server.vcproj have any connection with ipc at all? Because I reproduce a bug in these 2 projects.

comment:19 in reply to: ↑ 18 Changed 4 years ago by alarsen

  • Keywords wxSocket added

Replying to Pegasi:

Actually, I do not use any wxIPC-related code at all and have this issue. Does sockets_vc8_client.vcproj or sockets_vc8_server.vcproj have any connection with ipc at all? Because I reproduce a bug in these 2 projects.

wxIPC uses wxSockets for its message transport, that's the relation.
The fact that you see the problem using only wxSockets shows beyond doubt that my proposed fix from comment:16 is the wrong approach (but it didn't fix my problem anyway).

comment:20 follow-up: Changed 4 years ago by alarsen

I think I begin to grok what's going on now; the patch from comment:10 was almost right, except that it caused extraneous FD_READ events.

@Pegasi: Could you please try if this alternative hack solves your problem?
(select() shouldn't re-enable FD_READ)

  • src/msw/sockmsw.cpp

    a b LRESULT CALLBACK wxSocket_Internal_WinProc(HWND hWnd, 
    338338 
    339339        switch WSAGETSELECTEVENT(lParam) 
    340340        { 
     341            fd_set fds; 
     342            struct timeval t; 
     343 
    341344            case FD_READ: 
     345                FD_ZERO(&fds); 
     346                FD_SET(socket->m_fd, &fds); 
     347                t.tv_sec = t.tv_usec = 0; 
     348                if (select(socket->m_fd + 1, &fds, NULL, NULL, &t) <= 0) 
     349                    return 0;    // no data available to read 
    342350                event = wxSOCKET_INPUT; 
    343351                break; 
    344352 

comment:21 in reply to: ↑ 20 Changed 4 years ago by Pegasi

Replying to alarsen:

@Pegasi: Could you please try if this alternative hack solves your problem? (select() shouldn't re-enable FD_READ)

I could not reproduce the bug with your last patch. Seems it works :)

comment:22 Changed 4 years ago by vadz

Thanks for debugging this!

I'll apply the patch from the comment:20 if nobody has any better ideas about how to avoid this dummy FD_READ notification in the first place.

comment:23 Changed 4 years ago by xStream

I have same issue. In my project it happens when one 'server-side' connection have OnExecute event and do Request via another 'client-side' connection in this event : client ->>Execute>>- connect-server ->>Request>>- data-server.

So I have assert in 'Request' call. After that connect sever stops it's network activity at all. No OnExecute, no Request any more...

Patch from comment:ticket:11528:10 partially helps - assert comes much more rearly.

Patch from comment:ticket:11528:20 make connect-server does not recieve messages at all.

Now I trying to make sample to reproduce bug here if it still needed here...

comment:24 follow-up: Changed 4 years ago by vadz

If the patch from comment:20 doesn't work for you, I'd be very much interested in having an example showing this.

TIA!

comment:25 in reply to: ↑ 24 Changed 4 years ago by xStream

Replying to vadz:

If the patch from comment:20 doesn't work for you, I'd be very much interested in having an example showing this. TIA!

I'm working on it. But I was wrong - sometimes there are 1-2 responses comes from srver and only after that events 'gone'. But almost always - I even get no responses at all.
No firewals etc.

comment:26 in reply to: ↑ 3 Changed 4 years ago by vadz

Replying to alarsen:

Replying to vadz:

when MyFrame::OnRequest() calls Request(), it's perfectly possible that another click event is generated from inside this Request() call as it operates in non-blocking mode.

That's entirely possible of course, but it doesn't seem to be the cause of my problem:

To rule out that MyFrame::OnRequest() was called recursively I added a wxASSERT to the sample (see ipcclient.patch) and waited for the answer to appear in the log window before clicking request a second time; my wxASSERT is not triggered but the original wxASSERT is still provoked after 2 or 3 clicks.

I looked at this again and I slightly misunderstood the mechanism the first time. What happens is that when a pending event (from previous call to OnRequest()) is being dispatched, wxSocket::Read() is called. Being a dangerous function that it is, it continues to dispatch the events from inside it. And it's like this that it calls MyFrame::OnRequest() again. I.e. here is the extract of some frames from the call stack (more recent are first):

 	ipcclient.exe!wxOnAssert(const char * file=0x006887d8, int line=0x000000cf, const char * func=0x006887f4, const char * cond=0x0068881c, const char * msg=0x00688834)  Line 1129	C++
 	ipcclient.exe!wxSocketReadGuard::wxSocketReadGuard(wxSocketBase * socket=0x02346cd0)  Line 207 + 0x36 bytes	C++
 	ipcclient.exe!wxSocketBase::Read(void * buffer=0x00a1e52b, unsigned int nbytes=0x00000001)  Line 944 + 0xc bytes	C++
...
	ipcclient.exe!MyFrame::OnRequest(wxCommandEvent & __formal={...})  Line 369 + 0x45 bytes	C++
...
 	ipcclient.exe!wxWndProc(HWND__ * hWnd=0x001c1294, unsigned int message=0x00000202, unsigned int wParam=0x00000000, long lParam=0x0008003d)  Line 2735 + 0x1e bytes	C++
...
 	ipcclient.exe!wxGUIEventLoop::ProcessMessage(tagMSG * msg=0x00a1f398)  Line 273 + 0xc bytes	C++
 	ipcclient.exe!wxGUIEventLoop::DispatchTimeout(unsigned long timeout=0x00092714)  Line 345 + 0x13 bytes	C++
 	ipcclient.exe!wxSocketBase::DoWait(long timeout=0x000927c0, int flags=0x00000001)  Line 1436 + 0x13 bytes	C++
 	ipcclient.exe!wxSocketBase::DoWaitWithTimeout(int flags=0x00000001)  Line 235	C++
 	ipcclient.exe!wxSocketBase::DoRead(void * buffer_=0x00a1f4e3, unsigned int nbytes=0x00000001)  Line 987 + 0xa bytes	C++
 	ipcclient.exe!wxSocketBase::Read(void * buffer=0x00a1f4e3, unsigned int nbytes=0x00000001)  Line 946 + 0x10 bytes	C++
...
 	ipcclient.exe!wxIPCSocketStreams::Read8()  Line 219	C++
 	ipcclient.exe!wxTCPEventHandler::Client_OnRequest(wxSocketEvent & event={...})  Line 735 + 0x8 bytes	C++
...
 	ipcclient.exe!wxEventLoopManual::Run()  Line 159 + 0x8 bytes	C++
 	ipcclient.exe!wxAppConsoleBase::MainLoop()  Line 318 + 0x27 bytes	C++
 	ipcclient.exe!wxAppConsoleBase::OnRun()  Line 259 + 0x12 bytes	C++
 	ipcclient.exe!wxAppBase::OnRun()  Line 286	C++
 	ipcclient.exe!wxEntryReal(int & argc=0x00000001, wchar_t * * argv=0x022ee890)  Line 473 + 0x1d bytes	C++
 	ipcclient.exe!wxEntry(int & argc=0x00000001, wchar_t * * argv=0x022ee890)  Line 189 + 0xd bytes	C++
 	ipcclient.exe!wxEntry(HINSTANCE__ * hInstance=0x002a0000, HINSTANCE__ * __formal=0x00000000, HINSTANCE__ * __formal=0x00000000, int nCmdShow=0x00000001)  Line 415 + 0x10 bytes	C++
 	ipcclient.exe!WinMain(HINSTANCE__ * hInstance=0x002a0000, HINSTANCE__ * hPrevInstance=0x00000000, char * __formal=0x00ac4364, int nCmdShow=0x00000001)  Line 46 + 0x16 bytes	C++

The trouble is that the sample code has no way to protect against this AFAICS. And I also don't think the patch in comment:20 can help here. So I wonder if I'm actually seeing the same bug the rest of you do at all?

comment:27 Changed 4 years ago by vadz

  • Resolution set to fixed
  • Status changed from accepted to closed

Summary:

  • The problem of comment:23 was due to something else (socket code is not reentrant so you can't perform any socket operations, even on another socket, from a socket handler).
  • The problem with lack of idle events in the console applications remains mysterious, I'm pretty sure that it happens because the message queue never becomes empty because of the incoming timer events but I don't understand how is it possible as the timer interval is 1 second and it seems impossible that the program doesn't have time to process all messages (and so become idle) during this time. It almost looks like PeekMessage() is blocking but it shouldn't do this, of course... Anyhow, for now I'll change the sample to stop the timer once it doesn't need it any more, this stops flooding the queue with the timer events and everything does work.
  • The reasons of the main problem are also still unclear. I can definitely confirm that we get phantom FD_READ notifications from Windows but still no idea why do we get them. MSDN clearly says, in WSAAsyncSelect() description, that the notifications are level-triggered which seems to imply to me that they should only be generated when the incoming buffer is non-empty but, again, this just doesn't seem to be the case. Ignoring these spurious notifications seems to be the right thing to do and so while I'm not sure if there is no better fix to the problem which would avoid them in the first place, for now I'm just going to commit this.

Thanks to everybody for testing/debugging/fixing and to Anders in particular!

P.S. I suggest opening new bugs for any further problems with sockets/IPC as this one becomes too long and unwieldy. Thanks!

comment:28 Changed 4 years ago by VZ

(In [64565]) Avoid sending spurious socket read notifications in wxMSW.

If a read notification is generated for a socket, it should be possible to
read something from it without blocking but this doesn't seem to be always the
case under MSW for some reason. And this results in all sorts of problems in
wxSocket and wxIPC code, so check for this at wxSocketImpl level and not send
the notification at all if there is no data to read.

See #11528.

comment:29 Changed 4 years ago by VZ

(In [64566]) Stop the timer in console IPC client sample once we don't need it any more.

Under MSW the timer appeared to be flooding the message queue with timer
events faster than we could process them (which seems incredible for the timer
interval of 1 second but still seems to happen), so the idle events were never
generated and the sample didn't work at all.

Now stop the timer once we get a last notification from it to let the program
become idle and run the test function scheduled from the timer handler.

See #11528.

Note: See TracTickets for help on using tickets.