Opened 4 years ago

Closed 4 years ago

Last modified 7 weeks ago

#12636 closed defect (fixed)

Blocking pipe write in Unix?

Reported by: ryazanov Owned by: vadz
Priority: normal Milestone:
Component: base Version: stable-latest
Keywords: Cc:
Blocked By: Blocking:
Patch: no

Description

It seems, that wxProcess/wxExecute in Unix do not make the output pipe (connected to process' stdin) non-blocking. So, contrary to wxOutputStream::Write documentation (In some cases (for example a write end of a pipe which is currently full) it is even possible that there is no errors and zero bytes have been written.), an attempt to write to a full pipe can create a deadlock -- child process cannot read from the input pipe, if its output pipe is full, and that pipe can't be read out, because the main program waits in the Write().

Supporting evidence:

  1. I didn't find any kind of O_NONBLOCK in the relevant source files.
  2. The deadlock occurs for bzip2, which operates on large blocks of data, and does not occur for gzip, which uses smaller blocks.
  3. Addition of delays (wxMilliSleep(100)) after each Write() removes the deadlock, presumably giving bzip2 enough time to process the data and start output before the input pipe fills up.

P.S. The problem does not appear in wxMSW (and there is PIPE_NOWAIT in msw/utilsexc.cpp).

Attachments (2)

wxFile.diff download (5.3 KB) - added by ryazanov 4 years ago.
wxExecute.diff download (7.4 KB) - added by ryazanov 4 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 4 years ago by vadz

  • Component changed from wxGTK to base
  • Owner set to vadz
  • Status changed from new to accepted
  • Version changed from 2.8.11 to 2.9-svn

Yes, I think you're right and there is a real bug here. I'll commit the fix adding O_NONBLOCK soon, please test it and reopen the ticket if you still have any problems.

Thanks!

comment:2 Changed 4 years ago by VZ

(In [65992]) Refactor: extract code to make an fd non-clocking into a function.

Simply extract part of the code from evtloopunix.cpp into a reusable
wxPipe::MakeNonBlocking() function to be able to reuse it elsewhere.

See #12636.

comment:3 Changed 4 years ago by VZ

  • Resolution set to fixed
  • Status changed from accepted to closed

(In [65993]) Make write end of the child process pipe non-blocking under Unix.

We need to make at least one end of the pipe used to communicate with
wxExecute() child process non-blocking to avoid deadlocks, so unblock the
write end of the pipe. It seems to be unnecessary to unblock the reading ends
of std{out,err} pipes as we can already check for the presence of input there.
This is also consistent with wxMSW behaviour.

Closes #12636.

comment:4 Changed 4 years ago by ryazanov

Honestly, I didn't test the fix in the trunk, but setting O_NONBLOCK in wxW 2.8.11 (through a dirty hack by inheritance and cast to access the protected file descriptor) removes the locking problem.

However, it now reveals another nasty behavior. When a write is attempted to a full pipe, an error is set on the stream. I can't assert, that it is wrong by itself, but the bad thing is that it cannot be cleared! wxStreamBase::Reset() (which perhaps must be more properly called like ClearError()) is not documented, and even it does not help, since each OnSysWrite sets m_lasterror based on m_file->Error(), which in turn is only set in wxFile::Write and never reset (moreover, there are no means to clear it externally).

It think, it all should be reworked somehow.

The best solution to the pipe problem, in my opinion, is to not treat EAGAIN (or what it is?) as an error: there is nothing really wrong if the pipe is full, there is no need to report it via wxLogSysError as wxFile::Write does now, and wxOutputStream::LastWrite() == 0 is already a good indication of the situation. (Also, in wxMSW no error is produced, which is again kind of consistent with the wxOutputStream::Write documentation.)

As for wxOutputStream and wxFile in general, they might need an addition of better error handling mechanisms for non-fatal errors.

I don't reopen this ticket and don't create any new, leaving the right decision for you. :-)

comment:5 follow-up: Changed 4 years ago by vadz

  • Resolution fixed deleted
  • Status changed from closed to reopened

Thanks for testing, we definitely need to do something about this. I think we should either indeed never treat EAGAIN as error at wxFileOutputStream level or create a specific wxPipeOutputStream which would do this and use it here, as MSW already does. I don't really know how were the stream classes supposed to deal with this (my guess would be that nobody had thought about it) but creating wxPipeOutputStream should definitely be safe and correct so IMO we should do this for now.

It should be pretty simple to do but I have no code to test this so if you could provide an already tested patch adding it, it would be great.

comment:6 in reply to: ↑ 5 Changed 4 years ago by ryazanov

OK, I've started.

The idea is first to improve wxFile by adding GetLastError() and ClearError() methods -- it should not break anything, but might by helpful for other uses as well (right now it report error information only to the log).

ClearError() can be also added to wxStreamBase, though it's not absolutely necessary for the fix.

Then a new class wxPipeOutputStream is to be introduced, which simply inherits from wxFileOutputStream and just overloads its OnSysWrite method to suppress the unwanted errors (EAGAIN and EWOULDBLOCK) from wxFile.

What do you think?

P.S. I have a stupid question: how to download the current trunk version without SVN? The link http://wxwindows.sourceforge.net/snapshots/ redirects to http://biolpc22.york.ac.uk/pub/Daily_HEAD/, and the latest files there are wxWidgets-2010-06-30 (updated_at.txt says about Nov 4, but the files are definitely older), which naturally do not contain your last changes.

comment:7 Changed 4 years ago by ryazanov

Refinement of the last question: Download everything, not just separate files. Sorry for offtopic. :-)

comment:8 Changed 4 years ago by vadz

Sorry, the snapshots generation process broke down since quite some time so unfortunately right now using svn is the only possibility (well, you could also use git mirror at www.gitorious.org if you prefer).

Other than that, adding wxFile::ClearError() and GetLastError() looks fine. Just please remember to document them by adding their descriptions to interface/wx/file.h (with @since 2.9.2 lines in the body to indicate that they're new) as they could be generally useful. TIA!

Changed 4 years ago by ryazanov

comment:9 Changed 4 years ago by ryazanov

  • Patch set

First part: wxFile patch.

Changes:

bool m_error was replaced with int m_lasterror which now holds errno value of the last unsuccessful operation.

Two new methods were added:

  • int GetLastError() const which simply returns m_lasterror,
  • void ClearError() which clears the error code (sets m_lasterror = 0).

All (hopefully) methods, that can fail, now set m_lasterror = errno in addition to reporting errors to the log.

Notes:

  1. Error code is not set in methods declared const. It obviously can't be done properly. I didn't remove such constness being afraid to break some code relying on it.
  2. Now wxFileOffset Length() const internally calls const_cast<wxFile *>(this)->SeekEnd() and ((wxFile *)this)->Seek(), which are not const. Same story with bool Eof() const. This is not good, I think. Maybe, we should really drop const in such cases and then make the error reporting uniform.
  3. bool Error() const method can be removed. It was not documented anyway, and the new int GetLastError() const can be used instead.

Changed 4 years ago by ryazanov

comment:10 follow-up: Changed 4 years ago by ryazanov

Second part: patch for wxExecute (and related stuff) in UNIX.

A special class wxPipeOutputStream was intruduced to manage "pipe is full" errors.

HAS_PIPE_INPUT_STREAM was renamed to HAS_PIPE_STREAMS to better reflect its meaning. It doesn't seem to be used outside include/wx/unix/pipe.h, so this shouldn't change anything.

I have added wxLog:FlushActive() after wxLogSysError about non-blocking mode -- the reason is that if the program might hang, it might also do it before showing the message...


In the process of testing I discovered another bug, added in rev. 57324: when more than one child is running, each subsequent process inherits files from all the preceding ones. It leads to very undesired effects. For example, wxProcess::CloseOutput() can't really close the pipe, except for the very last process, since other child processes hold its descriptor. And process termination detection is probably broken similarly.

To fix the problem I have resurrected the relevant code from rev. 57023, but have moved it after redirection and modified to keep only the standard streams and the end process detection pipe open. Now several child processes can run independently, and hopefully "buggy children (e.g. xboard)" should not be affected, though it needs to be tested.

By the way, the method of closing file descriptors by counting from 0 to FD_SETSIZE doesn't seem to be the best. First, it might take a while, if FD_SETSIZE is not small. Second, it does not close descriptors above that value (also, wxPipeInputStream::CanRead() uses select() which has the same problem -- maybe it's better to use poll() there?).

comment:11 Changed 4 years ago by ryazanov

  • Patch unset

Oops. Forgot to add @since 2.9.2.

OT: Could you please add in HowToSubmitPatches a SVN snapshot link to http://gitorious.org/wxwidgets/wxwidgets/trees/master (or even the file directly: http://gitorious.org/wxwidgets/wxwidgets/archive-tarball/master)? Or the snapshots generation now works and will not break any more? :-)

comment:12 Changed 4 years ago by vadz

  • Status changed from reopened to accepted

Thanks, I'm going to apply the first of the patch and will look into the second one a bit later. I did some minimal changes, please let me know if I broke anything (it would be nice to have unit tests for the new functions to be sure about it).

comment:13 Changed 4 years ago by VZ

(In [66150]) Add wxFile::{Get,Clear}LastError() functions.

Remember the errno of the last file operation instead of just remembering
whether there was an error or not.

See #12636.

comment:14 in reply to: ↑ 10 Changed 4 years ago by vadz

Replying to ryazanov:

Second part: patch for wxExecute (and related stuff) in UNIX.

A special class wxPipeOutputStream was intruduced to manage "pipe is full" errors.

Thanks, I'll apply this soon.

HAS_PIPE_INPUT_STREAM was renamed to HAS_PIPE_STREAMS

Applied as well.

In the process of testing I discovered another bug, added in rev. 57324: when more than one child is running, each subsequent process inherits files from all the preceding ones. It leads to very undesired effects.

Ugh. Indeed, thanks for noticing this, I have no idea what was I thinking when committing r57324. It definitely needs to be fixed, thanks again for finding and debugging this!

To fix the problem I have resurrected the relevant code from rev. 57023, but have moved it after redirection and modified to keep only the standard streams and the end process detection pipe open. Now several child processes can run independently, and hopefully "buggy children (e.g. xboard)" should not be affected, though it needs to be tested.

Yes, this looks good to me.

By the way, the method of closing file descriptors by counting from 0 to FD_SETSIZE doesn't seem to be the best. First, it might take a while, if FD_SETSIZE is not small. Second, it does not close descriptors above that value

It definitely looks suspicious to me but there just doesn't seem to be any portable better way so it's not easy to fix unfortunately. This discussion mentions a few better but non-portable ones and, while searching for this, I also found getdtablesize(2) but it's not clear if it's really better to use than FD_SETSIZE or OPEN_MAX.

For Linux the best solution seems to iterate over /proc/self/fd entries. Any patches implementing this (better) approach would be welcome, of course.

(also, wxPipeInputStream::CanRead() uses select() which has the same problem -- maybe it's better to use poll() there?).

Sorry, I don't see the problem here: it tests a single descriptor only and FD_SETSIZE applies to the number of descriptors, not their values. Am I missing something?

comment:15 Changed 4 years ago by VZ

(In [66151]) No real changes, just reamed HAS_PIPE_INPUT_STREAM.

Renamed the symbol indicating whether pipe-based streams are available from
HAS_PIPE_INPUT_STREAM to HAS_PIPE_STREAMS as it's not really input-specific.

See #12636.

comment:16 Changed 4 years ago by VZ

(In [66152]) Fix spurious errors when writing to the child process stdin under Unix.

Since the child pipe was made non-blocking in r65993, it became possible to
write to child process without deadlocking when the pipe became full. However
this still resulted in an error from wxFileOutputStream as it didn't handle
EAGAIN returned from write() any differently than any other error, even though
it is an expected situation in this particular case.

Change Unix wxExecute() to use wxPipeOutputStream which ignores EAGAIN unlike
wxFileOutputStream to fix this.

See #12636.

comment:17 Changed 4 years ago by VZ

  • Resolution set to fixed
  • Status changed from accepted to closed

(In [66153]) Restore code for closing inherited file descriptors in the child.

The code closing all file descriptors inherited from the parent in the child
process created by wxExecute() was removed in r57324 by mistake (probably
due the fact that its meaning was poorly explained) but we still do need to do
this, of course, to avoid descriptor "leaks" (e.g. the parent couldn't really
close any of them).

Restore the code for closing all unneeded file descriptors in the child in
slightly modified form and add a comment pointing to an URL explaining how to
do it better in the future.

Closes #12636.

comment:18 Changed 4 years ago by ryazanov

Sorry, I don't see the problem here: it tests a single descriptor only and FD_SETSIZE applies to the number of descriptors, not their values. Am I missing something?

In fact, values. man select:

An fd_set is a fixed size buffer. Executing FD_CLR() or FD_SET() with a value of fd that is negative or is equal to or larger than FD_SETSIZE will result in undefined behavior.

Some implementations use a bit array for descriptors from 0 to FD_SETSIZE-1 and don't check bounds. In that case "undefined behavior" means "memory corruption".

As I see, all this business with file descriptors is rather strange. Perhaps, leaving things as is, with warning/comment in the code, is good enough for the moment. It might be added to some "to do list", if such exists. :-)

As for the unit tests -- is there any guide to read?

comment:19 Changed 4 years ago by vadz

Ok, thanks, I didn't know about this.

Unit tests: we have docs/tech/tn0017.txt and I think just looking at the existing ones is often quite useful. TIA!

comment:20 Changed 7 weeks ago by VZ

In 78086:

Restore code for closing inherited file descriptors under non-OS X.

The code for doing this was accidentally disabled in r74957, which intended to
disable it for Darwin, but ended up disabling it for everything but Darwin.

Reenable it for all platforms now, clearly it didn't hurt to have it enabled
under Darwin, and we do need to do it, e.g. see #12636.

Note: See TracTickets for help on using tickets.