The osutil yakshave
I’ve been planning to write a blog post about Mutation Testing, and finally got around to it a couple of weeks ago. I set up my example, and looked to some publicly-available mutation testing tools for my programming language of choice, Go (I get to use it quite often as an engineer in Google). The best-maintained one appears to be go-mutesting, so I figured I’ll try it out. Unfortunately, I ran into a build issue with one of its depdencies:
|
|
Yak shaving time! This was covered in zimmski/osutil#8, which showed it was an incompatibility with Go 1.10.
Testing the issue
It turns out there’s a really convenient way to check this using Docker (which I finally took the time to learn for the umpteenth iteration of the show downloading stack):
|
|
This will download a minimal image for getting (using git), building, and running Go code, extract it, get the zimmski/osutil
package, run the tests (successfully), and clean up after itself, leaving no trace on your system other than the cached base image for golang:1.9
. Change the 1.9
to a 1.10
and the process will be identical, except for the version of Go, and fail. In my opinion that’s pretty astoundingly convenient, especially as the whole thing takes just under 38 seconds. We’re cheating here, of course - docker needs to be preinstalled, and you could solve this in other ways (e.g. gvm). However, docker is pretty ubiquitous nowadays (Google Cloud Shell conveniently includes it), and this method does have the benefit of testing on a completely clean image (no surprise dependencies).
We can use a similar technique to test our fix, once we have it: On the host, go get github.com/zimmski/osutil
, and from the downloaded directory, run:
|
|
This will mount the current directory into the GOPATH
of the docker image (conveniently at /go
), get the required dependencies, and run our tests. You could modify this one-liner to not remove the image every time, but seeing as it only takes 7 seconds on consequent runs, I didn’t bother.
The issue and the fix
The root cause here is in capture.go
, which provides the Capture
and CaptureWithCGo
functions. These get a func()
callback, capture whatever it outputs to stdout
and stderr
, and return them as a string. The Capture
function only works with pure Go code, and CaptureWithCGo
is meant to support code that includes CGo as well. The latter assumes that the CGo code would use the C stdout
and stderr
globals (which are FILE *
pointers which are used by printf
and fprintf
), so it creates a pipe and points stdout
and stderr
at it. This has two problems:
- Assigning to
stdout
andstderr
is no longer allowed in Go 1.10 (and, according to golang/go#25221, was never intentionally allowed). - Functions could output to standard output and error in other ways, such as calling external programs or using the
write
system call. This is true for theCapture
function as well, but I wanted to modify as little behavior as possible.
Technically, the behavior-preserving solution could be to just use freopen
instead, but I didn’t know about it at the time. In general, capturing output using redirects seems to me like it should capture all output, regardless of how it’s generated.
To accomplish this, let’s first have a look a how shells accomplish redirects.
|
|
Here, strace runs a parent bash process (pid 20209, not shown in the trace above), which forks into PID 20210 which ultimately ends up running /bin/echo
(and not the echo
bash builtin). To accomplish the redirect, bash does the following:
- Open the requested file, which ends up being file descriptor 3.
- Use the
dup2
system call to overwrite file descriptor 1 (standard output) with the same file descriptor as 3. Now this open file has two descriptors pointing at it. - Close file descriptor 3; this reduces the number of file descriptors pointing at
/tmp/redirected
back to one. - Finally, uses
execve
to replace the running program with/bin/echo
, which will (as always) output to file descriptor 1, which now points to/tmp/redirected
.
No matter how echo
internally causes output to appear (even if it ran yet another binary), the output would always go to /tmp/redirected
.
It’s worth mentioning that the dup
system call is similar to the dup2
system call, but the caller doesn’t choose the destination file descriptor; instead, the first available file descriptor is used and returned.
This technique is the basic one behind the fix. The old method was, roughly:
- Save the old
os.Stdout
,os.Stderr
,C.stdout
, andC.stderr
objects - Open a pipe - this gets you two file descriptors (
w.Fd()
andr.Fd()
) - Point the Go objects
os.Std{out,err}
atw.Fd()
by just assigningw
to them - Point the C objects
C.std{out,err}
atw.Fd()
by opening it withfdopen
and assigning the result to them. (This no longer works) - Call the callback function
- Copy from the
r
end of the pipe to a buffer usingio.Copy
. - When the method returns (using
defer
), restore the four objects we saved
The new technique is, roughly:
- Use
syscall.Dup
to save file descriptors 1 and 2 (standard output and error) - Open the pipe as before
- Use
syscall.Dup2
to overwrite file descriptors 1 and 2 withw.Fd()
- When the method returns, restore the original file descriptors
- Call the callback function
- Close all instances of the
w
end of the pipe - Copy from the
r
end of the pipe to a buffer usingio.Copy
. - WHen the method returns, restore the original file descriptors 1 and 2
When closing all instances of the w
end of the pipe, this means w.Fd()
, syscall.Stdout
, and syscall.Stderr
. If any of those three stays open, the underlying file descriptor will still count as open, and io.Copy
will never return.
To demonstrate this, let’s take a look at a simplified version (no error handling, don’t try this at home):
|
|
With closeStdout
set to true, everything works correctly. However, with it false, io.Copy
will hang.
The full code is in the playground, but because of golang/go#24610, for which the fix has yet to be rolled out, Dup2
will fail, so you need to copy the code over to your local machine.